Welcome Wildcards!
From v2.8.0, DiscrimiNAT Firewall supports use of wildcards in an FQDN allowlist.
If you're in an operations role (SRE, DevOps, etc) the Operation section will be of particular interest to you.
Syntax
➟ The character _
(or even ?
on GCP) may be used to substitute exactly one wild character in an FQDN to be allowed.
➟ Each _
(or ?
) character must match one domain name permissible letter. It cannot match zero or more than one characters.
➟ The set of wild characters is from a
to z
, 0
to 9
and the -
(hyphen or minus) only.
➟ The .
(period, dot or full stop) character is not included.
➟ You may use any number of wildcards in a single FQDN address (in the allowlist). See examples below.
➟ They are supported for the TLS protocol only.
Wildcard pattern based rules are not a good substitute for your current, well-known FQDN based rules. See the Caveats section for why.
Examples
discriminat:tls:???.bbc.co.uk
will match:
www.bbc.co.uk
123.bbc.co.uk
x-9.bbc.co.uk
will not match:
bbc.co.uk
smtp.bbc.co.uk
s3.bbc.co.uk
w.3.bbc.co.uk
discriminat:tls:b?-kjr-312q4.us-??st?.gcp.confluent.cloud
will match:
b1-kjr-312q4.us-west2.gcp.confluent.cloud
b9-kjr-312q4.us-east4.gcp.confluent.cloud
b--kjr-312q4.us-east1.gcp.confluent.cloud
will not match:
b9-kjr-312q4.us-east.gcp.confluent.cloud
b42-kjr-312q4.us-east4.gcp.confluent.cloud
b-kjr-312q4.us-east4.gcp.confluent.cloud
kjr-312q4.us-east4.gcp.confluent.cloud
discriminat:tls:?????.???
will match:
scone.com
scone.net
scone.org
buddy.dog
will not match:
medic.co.uk
buster.co
Operation
The following describes how DiscrimiNAT dynamically adds rules to itself based on wildcard patterns defined by the user and any observed traffic matching those patterns.
- Upon first allowlisting a wildcarded FQDN, a
-config
log entry will be emitted as follows from all DiscrimiNAT instances:{"cat": "wildcard-addr",
"gid": "foo",
"proto": "tls",
"instance": "i-bar-xyz",
"addr": "???.bbc.co.uk",
"outcome": "accepted"} - When an application first tries to access an FQDN that would match a wildcarded pattern, a rule for that FQDN is dynamically added to the DiscrimiNAT instance through which this occurred. That instance will emit another
-config
log entry as follows. The private IP of the network interface/VM that initiated this outbound request was 192.168.101.6 in this example:{"cat": "addr",
"gid": "foo",
"outcome": "accepted",
"instance": "i-bar-xyz",
"reason": "initiated by 192.168.101.6 causing pattern `???.bbc.co.uk` to match",
"addr": "www.bbc.co.uk",
"proto": "tls"} - At this time, the packet will be rejected with the reason
cache not ready
in the-flow
log as follows. This state can last from 30 to 90 seconds:{"dpt": 443,
"instance": "i-bar-xyz",
"proto_v": "1.3",
"cat": "client",
"reason": "cache not ready",
"spt": 41200,
"src": "192.168.101.6",
"dhost": "www.bbc.co.uk",
"outcome": "disallowed",
"dst": "203.0.113.9",
"proto": "tls"
} - After 30 to 90 seconds, the FQDN will work as expected. Note that this process will need to occur on each DiscrimiNAT instance and as noted in caveats below, we're working on fixing this at top priority for the next release.
{"dhost": "www.bbc.co.uk",
"dpt": 443,
"dst": "203.0.113.9",
"proto": "tls",
"cat": "client",
"instance": "i-bar-xyz",
"src": "192.168.101.6",
"reason": "matching rule found in foo",
"outcome": "allowed",
"spt": 58412,
"proto_v": "1.3"}
The particular reason for following this process is that DNS results for given FQDNs can be validated and cached – a process that needs a little bit of warmup time for our proprietary Wormhole DNS technology. This is crucial for safety and prevention of trivial bypasses such as SNI spoofing. See our litmus test and an independent, third-party blog post on such a bypass in AWS Network Firewall.
Caveats
- As of v2.8.0, the dynamically added list of FQDNs, having matched a wildcarded pattern, is ephemeral per instance of DiscrimiNAT. This issue will be resolved with the highest priority for the next release.
- Each instance of DiscrimiNAT will need to witness a matching packet trying to pass through before it warms up its cache for it. This issue will also be resolved with the highest priority for the next release.
- It can take from 30 to 90 seconds for a dynamically added FQDN to be able to pass through successfully for the first time.
- Wildcards only work for the TLS protocol.
- The
*
(asterisk or splat) character, that would match zero or more characters, is not supported. This is for your own safety and is discussed in-depth in the Philosophy section below.
It is recommended to use at least machine size n2-highcpu-2
/n2d-highcpu-2
on GCP or c6i.large
/c6a.large
on AWS when using wildcards.
Philosophy
If we look at the internet as a menagerie of different creatures, we can see that some are our pets, some we want to be friends with, some we don't care about, and some can be outright dangerous.
If computers had the job of matching all of the above, they'd be able to understand our intent with use of the *
glob character.
Effective egress control is akin to picking the individual animals to engage with rather than whole zoos. For example, panda.mymate.zoo
and zebra.mymate.zoo
are entities you want to have a channel open with, but lion.mymate.zoo
not so much. The pelican.
over at friend.zoo
is okay too, but not perhaps the deer.friend.zoo
at this time. Then there's snake.foe.zoo
, or generally anything at apex domain foe.zoo
you want to avoid contact with, but perhaps with the exception of magpie.foe.zoo
for the time being due to some history.
These individual animals are tenants at these respective zoos. Much like the Cloud today where individual, not-connected, private entities (anybody with a credit card) could be a tenant at one of AWS, GCP, Cloudflare, Fastly, Akamai, Salesforce, etc with their own dedicated subdomain.
System 1
With an overwhelming amount of subdomains observed in our egress traffic to a particular apex domain, we could be tempted to allow access to the entire apex domain – for example, *.s3.amazonaws.com
or *.githubusercontent.com
. If the syntax were permitting, this could seem to be the best way to balance time spent on, and security of, allowlists for outbound traffic.
That is System 1 thinking.
🤨 Dear ChatGPT, can you very, very concisely describe System 1 and System 2 from Kahneman's Thinking, Fast and Slow?
🤖 System 1 is fast, automatic, and intuitive thinking. System 2 is slow, deliberate, and analytical thinking.
🤨 Okay, bit more.
🤖 System 1 operates quickly and effortlessly, relying on intuition and heuristics to make decisions without conscious thought. It's prone to biases and errors but is efficient for routine tasks.
🤖 System 2, on the other hand, is slow, deliberate, and requires conscious effort. It engages in logical reasoning, problem-solving, and critical thinking, helping to correct or override the instincts of System 1, though it's more mentally taxing.
To continue to deliver a product that is safe to operate by engineers of any specialisation, we knew we had to invoke the System 2 part of the brain and prevent an overly permissive, multi-tenant pattern from entering the allowlists.
Data
We now have over 100MM lines of log samples – thanks to the customers who've chosen to leave Automated System Health Reporting turned on. This data suggests that most of the time when you are dealing with unpredictable subdomains, ones that cannot be known in advance, they tend to be FQDNs of CDNs or APIs for client-side load balancing. Some examples:
7-32-3-app.agent.datadoghq.com
cdn02.quay.io
b9-kjr-312q4.us-east4.gcp.confluent.cloud
repeater-local-007-ec2-use1e-prod.browserstack.com
Using System 2, the invariant part of these examples can be rather quickly spotted!
7-32-3-app.agent.datadoghq.com
: Datadog's documentation confirms the pattern is <VERSION>-app.agent.datadoghq.com
. By restricting the use of *
and only allowing the match of exactly one character at a time, this could be defined as ?-??-?-app.agent.datadoghq.com
. If logs from dry-run/monitor mode show a single digit in the midst as well, then another pattern could be added to the allowlist: ?-?-?-app.agent.datadoghq.com
.
cdn02.quay.io
: Redhat's documentation lists the possible variations. The patterns would be cdn.quay.io
and cdn??.quay.io
.
b9-kjr-312q4.us-east4.gcp.confluent.cloud
: In this case, another FQDN which is a substring of this was seen pass through the firewall: kjr-312q4.us-east4.gcp.confluent.cloud
. This would appear to be the tenant ID of the customer this log line is from. (Original alphanumeric identifiers have been changed to prevent information disclosure.) The number after the very first b
goes from 0
to 26
. So, the patterns for accessing this Kafka cluster would be kjr-312q4.us-east4.gcp.confluent.cloud
, b?-kjr-312q4.us-east4.gcp.confluent.cloud
and b??-kjr-312q4.us-east4.gcp.confluent.cloud
.
repeater-local-007-ec2-use1e-prod.browserstack.com
: With similar FQDNs accessed, such as repeater-local-008-dcp-use3a-prod.browserstack.com
and repeater-local-052-ec2-use3b-prod.browserstack.com
, System 1 could've defaulted to *.browserstack.com
. With the option of *
taken away, System 2 will see that the pattern is repeater-local-???-???-us???-prod.browserstack.com
. The use1e
, use3a
and use3b
are geographies since cac1a
, euw1a
and apse2a
were also observed in that position. System 2 may have well prevented your data from being processed in a location it wasn't meant to be.
Other potential candidates include, for example, 890831537354-dot-europe-west2.kernels.googleusercontent.com
. The first group of digits appear to be a tenant's numeric project ID and therefore it's best not to replace it with a *
, as that would match all possible tenant projects on GCP!
The more rigorous, System 2 invoking syntax seems to cover all the use-cases from the data we have well.
Feedback
At this time, we believe with the ability to specify patterns in a safer way than with use of the match zero or more than one characters convention of the *
symbol, we're enabling our customers to simplify their management of the rule-set whilst addressing legitimate challenges with variable addressing. We remain, of course, open to critical feedback and willing to change, provided relevant data and challenges are shared with us.
Do write to us with your thoughts.
If you're not familiar yet, we have 3½ minute videos (AWS, GCP), with copyable CLI commands underneath, that can help you discover and extract the list of FQDNs particular apps in your environment have been accessing – in a format that can be used straightaway in the syntax, speeding up your IaC iterations.