Skip to main content

Wildcards and System 2 Thinking

· 9 min read

Welcome Wildcards!

From v2.8.0, DiscrimiNAT Firewall supports use of wildcards in an FQDN allowlist.

tip

If you're in an operations role (SRE, DevOps, etc) the Operation section will be of particular interest to you.

Syntax

➟ The character _ (or even ? on GCP) may be used to substitute exactly one wild character in an FQDN to be allowed.

➟ Each _ (or ?) character must match one domain name permissible letter. It cannot match zero or more than one characters.

➟ The set of wild characters is from a to z, 0 to 9 and the - (hyphen or minus) only.

➟ The . (period, dot or full stop) character is not included.

➟ You may use any number of wildcards in a single FQDN address (in the allowlist). See examples below.

➟ They are supported for the TLS protocol only.

note

Wildcard pattern based rules are not a good substitute for your current, well-known FQDN based rules. See the Caveats section for why.

Examples

discriminat:tls:???.bbc.co.uk

will match:
www.bbc.co.uk
123.bbc.co.uk
x-9.bbc.co.uk

will not match:
bbc.co.uk
smtp.bbc.co.uk
s3.bbc.co.uk
w.3.bbc.co.uk


discriminat:tls:b?-kjr-312q4.us-??st?.gcp.confluent.cloud

will match:
b1-kjr-312q4.us-west2.gcp.confluent.cloud
b9-kjr-312q4.us-east4.gcp.confluent.cloud
b--kjr-312q4.us-east1.gcp.confluent.cloud

will not match:
b9-kjr-312q4.us-east.gcp.confluent.cloud
b42-kjr-312q4.us-east4.gcp.confluent.cloud
b-kjr-312q4.us-east4.gcp.confluent.cloud
kjr-312q4.us-east4.gcp.confluent.cloud


discriminat:tls:?????.???

will match:
scone.com
scone.net
scone.org
buddy.dog

will not match:
medic.co.uk
buster.co


Operation

The following describes how DiscrimiNAT dynamically adds rules to itself based on wildcard patterns defined by the user and any observed traffic matching those patterns.

  1. Upon first allowlisting a wildcarded FQDN, a -config log entry will be emitted as follows from all DiscrimiNAT instances:
    {"cat": "wildcard-addr",
    "gid": "foo",
    "proto": "tls",
    "instance": "i-bar-xyz",
    "addr": "???.bbc.co.uk",
    "outcome": "accepted"}
  2. When an application first tries to access an FQDN that would match a wildcarded pattern, a rule for that FQDN is dynamically added to the DiscrimiNAT instance through which this occurred. That instance will emit another -config log entry as follows. The private IP of the network interface/VM that initiated this outbound request was 192.168.101.6 in this example:
    {"cat": "addr",
    "gid": "foo",
    "outcome": "accepted",
    "instance": "i-bar-xyz",
    "reason": "initiated by 192.168.101.6 causing pattern `???.bbc.co.uk` to match",
    "addr": "www.bbc.co.uk",
    "proto": "tls"}
  3. At this time, the packet will be rejected with the reason cache not ready in the -flow log as follows. This state can last from 10 to 90 seconds and the application will timeout not receiving a connection reset:
    {"dpt": 443,
    "instance": "i-bar-xyz",
    "proto_v": "1.3",
    "cat": "client",
    "reason": "cache not ready",
    "spt": 41200,
    "src": "192.168.101.6",
    "dhost": "www.bbc.co.uk",
    "outcome": "disallowed",
    "dst": "203.0.113.9",
    "proto": "tls"
    }
  4. After 10 to 90 seconds, the FQDN will work as expected. Note that this process will need to occur on each DiscrimiNAT instance and as noted in caveats below, we're working on fixing this at top priority for the next release.
    {"dhost": "www.bbc.co.uk",
    "dpt": 443,
    "dst": "203.0.113.9",
    "proto": "tls",
    "cat": "client",
    "instance": "i-bar-xyz",
    "src": "192.168.101.6",
    "reason": "matching rule found in foo",
    "outcome": "allowed",
    "spt": 58412,
    "proto_v": "1.3"}

The particular reason for following this process is that DNS results for given FQDNs can be validated and cached a process that needs a little bit of warmup time for our proprietary Wormhole DNS technology. This is crucial for safety and prevention of trivial bypasses such as SNI spoofing. See our litmus test and an independent, third-party blog post on such a bypass in AWS Network Firewall.

Caveats

  1. As of v2.8.0, the dynamically added list of FQDNs, having matched a wildcarded pattern, is ephemeral per instance of DiscrimiNAT. This issue will be resolved with the highest priority for the next release. (Note: still unresolved as of v2.9.0)
  2. Each instance of DiscrimiNAT will need to witness a matching packet trying to pass through before it warms up its cache for it. This issue will also be resolved with the highest priority for the next release.
  3. It can take from 10 to 90 seconds for a dynamically added FQDN to be able to pass through successfully for the first time.
  4. Wildcards only work for the TLS protocol.
  5. The * (asterisk or splat) character, that would match zero or more characters, is not supported. This is for your own safety and is discussed in-depth in the Philosophy section below.
  6. From v2.9.0, the * (asterisk or splat) character, that would match zero or more characters, is supported but according to the wildcard_exposure preference.
caution

It is recommended to use at least machine size n2-highcpu-2/n2d-highcpu-2 on GCP or c6i.large/c6a.large on AWS when using wildcards.

Philosophy

If we look at the internet as a menagerie of different creatures, we can see that some are our pets, some we want to be friends with, some we don't care about, and some can be outright dangerous.

If computers had the job of matching all of the above, they'd be able to understand our intent with use of the * glob character.

Effective egress control is akin to picking the individual animals to engage with rather than whole zoos. For example, panda.mymate.zoo and zebra.mymate.zoo are entities you want to have a channel open with, but lion.mymate.zoo not so much. The pelican. over at friend.zoo is okay too, but not perhaps the deer.friend.zoo at this time. Then there's snake.foe.zoo, or generally anything at apex domain foe.zoo you want to avoid contact with, but perhaps with the exception of magpie.foe.zoo for the time being due to some history.

These individual animals are tenants at these respective zoos. Much like the Cloud today where individual, not-connected, private entities (anybody with a credit card) could be a tenant at one of AWS, GCP, Cloudflare, Fastly, Akamai, Salesforce, etc with their own dedicated subdomain.

System 1

With an overwhelming amount of subdomains observed in our egress traffic to a particular apex domain, we could be tempted to allow access to the entire apex domain – for example, *.s3.amazonaws.com or *.githubusercontent.com. If the syntax were permitting, this could seem to be the best way to balance time spent on, and security of, allowlists for outbound traffic.

That is System 1 thinking.

The answer is not $1.00. Image credit: https://www.themarketingstudent.com/thinking-fast-and-slow/

🤨 Dear ChatGPT, can you very, very concisely describe System 1 and System 2 from Kahneman's Thinking, Fast and Slow?

🤖 System 1 is fast, automatic, and intuitive thinking. System 2 is slow, deliberate, and analytical thinking.

🤨 Okay, bit more.

🤖 System 1 operates quickly and effortlessly, relying on intuition and heuristics to make decisions without conscious thought. It's prone to biases and errors but is efficient for routine tasks.

🤖 System 2, on the other hand, is slow, deliberate, and requires conscious effort. It engages in logical reasoning, problem-solving, and critical thinking, helping to correct or override the instincts of System 1, though it's more mentally taxing.


To continue to deliver a product that is safe to operate by engineers of any specialisation, we knew we had to invoke the System 2 part of the brain and prevent an overly permissive, multi-tenant pattern from entering the allowlists.

Data

We now have over 100MM lines of log samples – thanks to the customers who've chosen to leave Automated System Health Reporting turned on. This data suggests that most of the time when you are dealing with unpredictable subdomains, ones that cannot be known in advance, they tend to be FQDNs of CDNs or APIs for client-side load balancing. Some examples:

7-32-3-app.agent.datadoghq.com
cdn02.quay.io
b9-kjr-312q4.us-east4.gcp.confluent.cloud
repeater-local-007-ec2-use1e-prod.browserstack.com

Using System 2, the invariant part of these examples can be rather quickly spotted!

7-32-3-app.agent.datadoghq.com: Datadog's documentation confirms the pattern is <VERSION>-app.agent.datadoghq.com. By restricting the use of * and only allowing the match of exactly one character at a time, this could be defined as ?-??-?-app.agent.datadoghq.com. If logs from dry-run/monitor mode show a single digit in the midst as well, then another pattern could be added to the allowlist: ?-?-?-app.agent.datadoghq.com.

cdn02.quay.io: Redhat's documentation lists the possible variations. The patterns would be cdn.quay.io and cdn??.quay.io.

b9-kjr-312q4.us-east4.gcp.confluent.cloud: In this case, another FQDN which is a substring of this was seen pass through the firewall: kjr-312q4.us-east4.gcp.confluent.cloud. This would appear to be the tenant ID of the customer this log line is from. (Original alphanumeric identifiers have been changed to prevent information disclosure.) The number after the very first b goes from 0 to 26. So, the patterns for accessing this Kafka cluster would be kjr-312q4.us-east4.gcp.confluent.cloud, b?-kjr-312q4.us-east4.gcp.confluent.cloud and b??-kjr-312q4.us-east4.gcp.confluent.cloud.

repeater-local-007-ec2-use1e-prod.browserstack.com: With similar FQDNs accessed, such as repeater-local-008-dcp-use3a-prod.browserstack.com and repeater-local-052-ec2-use3b-prod.browserstack.com, System 1 could've defaulted to *.browserstack.com. With the option of * taken away, System 2 will see that the pattern is repeater-local-???-???-us???-prod.browserstack.com. The use1e, use3a and use3b are geographies since cac1a, euw1a and apse2a were also observed in that position. System 2 may have well prevented your data from being processed in a location it wasn't meant to be.

Other potential candidates include, for example, 890831537354-dot-europe-west2.kernels.googleusercontent.com. The first group of digits appear to be a tenant's numeric project ID and therefore it's best not to replace it with a *, as that would match all possible tenant projects on GCP!

The more rigorous, System 2 invoking syntax seems to cover all the use-cases from the data we have well.

Feedback

At this time, we believe with the ability to specify patterns in a safer way than with use of the match zero or more than one characters convention of the * symbol, we're enabling our customers to simplify their management of the rule-set whilst addressing legitimate challenges with variable addressing. We remain, of course, open to critical feedback and willing to change, provided relevant data and challenges are shared with us.

Do write to us with your thoughts.

tip

If you're not familiar yet, we have 3½ minute videos (AWS, GCP), with copyable CLI commands underneath, that can help you discover and extract the list of FQDNs particular apps in your environment have been accessing in a format that can be used straightaway in the syntax, speeding up your IaC iterations.

Bibliography

Image credit: https://us.macmillan.com/books/9780374275631/thinkingfastandslow