Why default deny still fails in practice

Every introduction to network security tells you to set your firewall's default policy to deny. Allow only what is needed. Block everything else.

This is correct advice. It is also routinely broken in practice, including by people who can recite the principle in their sleep. I want to write about why.

The principle

The principle is simple. There are two ways to write a firewall ruleset.

The default allow approach is: by default, packets pass. Add rules to deny specific things you do not want. This is sometimes called blacklist policy.

The default deny approach is: by default, packets do not pass. Add rules to permit specific things you do want. This is whitelist policy.

The security argument for default deny is overwhelming. With default allow, every threat you have not anticipated gets through. With default deny, every threat you have not anticipated is blocked, at the cost of also blocking some legitimate traffic you forgot to whitelist.

The security cost of being wrong is asymmetric. A default-allow oversight is a hole. A default-deny oversight is an outage. Outages are noticed and fixed. Holes are not noticed at all.

How it fails in practice

The principle is not the problem. The problem is what happens when you have to maintain a default-deny ruleset over time, in an environment where requirements change.

A new service goes live. Someone adds a permit rule for the new traffic. They are not, for understandable reasons, completely sure what traffic the service generates — it is a new service, the docs are imperfect, and the testing was on an internal network where the firewall was open. The rule they add is, accordingly, broader than strictly necessary. "Allow TCP from anywhere to this service's port" rather than "allow TCP from this list of customer ranges to this service's port".

The service works. The team moves on. The rule remains.

Three months later, another service goes live, with a similar rule. Six months later, ten such rules. A year in, the actual ruleset reads:

# (default deny)
allow any -> 10.1.1.5:443
allow any -> 10.1.1.6:443
allow any -> 10.1.1.6:8080
allow any -> 10.1.1.7:tcp/22
allow any -> 10.1.1.7:tcp/80
allow any -> 10.1.1.8:any/any
...

This is not, in any meaningful sense, default deny. This is default allow with a list of internal IPs. The default-deny posture is preserved. The default-deny property is not.

The mechanism that produces this

The mechanism is structural, not personal. Every team I have seen produce the above had members who understood and believed in default deny. None of them set out to write a permissive ruleset. The drift is the natural consequence of a few small forces:

Service onboarding has a deadline. Rule auditing does not. When a service is going live tomorrow, the right rule is whatever rule makes it work. The right rule for a known-good ruleset, six months later, would be narrower. There is rarely a process that does the narrowing.

The cost of being wrong is asymmetric in the wrong direction. If a permit rule is too narrow, the new service breaks and the team responsible is paged. If the same rule is too broad, nothing visible happens. The system rewards over-permission.

Visibility decays. A new rule is reviewed by whoever wrote it. Six months later, that rule is one of three hundred, and nobody is reviewing the cumulative shape of the ruleset. The picture of "what the firewall actually does" exists nowhere coherent.

What helps, in practice

A few things, which I have either tried on my own gear or watched done by larger operators.

Scoped rule expressions. Where the firewall language allows, write the rule narrowly the first time. Allow specific source ranges, not any. Allow specific destination ports, not any/any. Yes, this requires more thought up-front. It is still cheaper than the auditing exercise that does not happen.

Periodic rule reviews on a calendar. Every quarter, walk the ruleset. Identify rules that have not seen a packet in 90 days; consider removing them. Identify rules with any in source or destination; consider whether they could be tightened.

Rule expression comments. Every permit rule should have a comment that says what service this is for, who owns it, and when it was added. Without that, you cannot retire a rule with confidence, because you do not know who to ask.

Default-deny logs that you actually read. The log of dropped traffic is informative. If you are seeing legitimate-looking drops from a known-good source, you have either a misconfigured rule or a misconfigured service. Either way, you want to know.

The deeper version of the lesson

The specific problem is firewalls. The general problem is that any default-deny mechanism — file permissions, mandatory access controls, application-level allow-lists — has the same drift property. Allowing more is fast. Removing what was already allowed is slow and politically expensive.

The only way default-deny works in the long term is to treat the narrowness of the policy as its own metric, separate from its correctness. "Are there any packets being dropped that should not be?" is the wrong question by itself. "How specific are our permit rules, and how recently were they reviewed?" is the question that distinguishes a default-deny posture from a default-deny relic.

None of this is novel. All of it is the kind of thing that gets nodded at in textbooks and ignored in practice. Writing it down here mostly so I can come back to it the next time I am tempted to add a permit rule with any in any of its slots, and remember why I shouldn't.


Back to all writing