Building defences against floods, when you cannot defend

Last week I described what a UDP flood looks like at the packet level. This week I want to write about what I have actually changed on my own infrastructure as a result of the Minnesota incident and the surrounding research. None of these stop a real distributed attack. All of them are worth doing anyway.

The mindset shift

The single most important thing the last few weeks of reading have done is to shift my mental model of "defence" for this category of attack.

For most attack types, the question is: will my hosts survive being attacked? You harden, patch, monitor, and the result is a host that resists compromise.

For distributed denial of service, the question is fundamentally different. I cannot prevent the attack. What I can do is:

Reduce the cost of being targeted, so that I survive the attack with less damage.
Ensure my hosts are not used as attack sources against others.
Have a response plan for when an attack does happen.

This is operational resilience, not host hardening. It is a different discipline. Most of what I have written about until now has been about the first kind. The second is what this post is about.

Reducing the cost of being a target

Three changes that are within reach of a small operator.

Drop unsolicited traffic at the firewall, not at the kernel. As I mentioned last week, an unsolicited UDP packet to a closed port produces an ICMP Port Unreachable reply by default. The reply costs bandwidth. The reply also takes effort from the host's network stack.

My current ruleset drops all incoming UDP traffic to closed ports at the firewall level, silently. The kernel never sees it. There are no replies. Bandwidth cost is halved, host effort is essentially zero. The relevant ipchains rule:

ipchains -A input -p udp -j DROP

— with explicit accept rules above it for the few UDP services I actually run (DNS, NTP).

Rate-limit ICMP unconditionally. Even with the firewall change above, there are some legitimate ICMP responses I want my host to send (echo replies for ping, for instance). I have set sysctl net.ipv4.icmp_ratelimit=200 and net.ipv4.icmp_msgs_per_sec=10. This caps the rate at which the host generates ICMP traffic, regardless of how much triggering traffic arrives. Pings of legitimate magnitude work; floods that try to cause a ping-storm by exploiting some triggering condition do not.

Limit half-open TCP connections. The Linux kernel maintains a SYN queue per listening socket; SYN flood attacks fill this queue. Setting tcp_max_syn_backlog higher and enabling tcp_syncookies makes the kernel resistant to small SYN floods. The relevant sysctls:

net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_synack_retries = 2

This raises the queue size, enables SYN cookies (which I read about while reading the kernel network stack), and reduces the retry count for unanswered SYN-ACKs. The combination handles small SYN floods without service interruption.

Ensuring my hosts are not part of the problem

This is where the bulk of the recent reading has pushed me. The Minnesota attack was not enabled by 200 attacker-owned machines — it was enabled by 200 compromised machines, owned by other people, conscripted into the attack.

For my own hosts, the relevant disciplines are:

Egress filtering. I wrote about this in June; the relevant rule is the one preventing source-spoofed packets from leaving my network. Now applied universally.

Outbound traffic monitoring. Any unusual outbound traffic from my hosts should produce an alert. MRTG shows me daily totals. Snort running on my own outbound interface watches for known DDoS-tool signatures: Trinoo's daemon-to-master communication, the Stacheldraht control channel, anything that looks like a flood pattern leaving my hosts.

Outbound rate limits. No host on my network should be able to generate more than a small multiple of its normal outbound traffic before something throttles it. This is not absolute protection — a determined attacker with shell access can still flood at the link rate — but it slows down the most automated attack tools, which generally do not bother to throttle themselves to look normal.

Periodic compromise audits. Every two weeks I look at every running process on every host I administer, look at the file modification times in /bin, /sbin, /usr/bin, /usr/sbin, run tripwire-like integrity checks against the known-good baselines, and verify that nothing looks like an unauthorised daemon. This is tedious. It is also the only way to catch a compromise that has not yet started doing anything visible.

Having a response plan

This is the part most operators (including me, until last month) do not have written down.

The specific scenarios I have planned for:

My uplink saturates from inbound traffic. I lose access to my own infrastructure from the outside. I need an out-of-band channel to my upstream. For me, this is a phone call to my ISP's NOC and a known account number. The phone number is on a printed card in my desk drawer. The account number is memorised. This is, frankly, the bare minimum.

The upstream blocks the wrong thing. When the upstream applies emergency filters, they may inadvertently block legitimate traffic. I need to know how to verify, post-block, whether legitimate traffic is being affected. This is a checklist of "can my mail reach me, can DNS resolve me, can my own outbound traffic reach Google" type tests.

The attack is sustained for hours or days. Real Minnesota-scale attacks ran for two days. I need to know what I am willing to do to maintain operations during that period — accept a degraded service, fail over to a different IP if I have one, or simply wait it out.

The attack ends suddenly. Attacks end. Often without warning. My response plan needs to include the un-do steps: when the upstream's emergency filters can be removed, when the rate limits can be relaxed, when normal service resumes.

None of this is rocket science. It is the kind of basic operational discipline that, written down once, is enormously valuable when needed. Without it, the moment of crisis is the moment when the planning happens, badly.

A small honest assessment

For my own scale of operation, the realistic assessment is: I am exposed to any attack with serious resources behind it. There is no defence at my level that holds up against a Minnesota-scale flood. What I can do is not be the obvious low-hanging target, and not contribute to attacks against others.

This is, in some real sense, a coordination problem rather than a host-security problem. Every operator on the internet has a role in keeping their own hosts from being conscripted. The aggregate of those small disciplines — applied to enough hosts — is what eventually shrinks the available pool for attackers.

This is what good operational hygiene actually does. It does not protect you. It protects everyone except you, slowly, indirectly. Knowing this is what makes the discipline sustainable. You do it because it is the right thing to do, not because you will personally see the benefit.

Which is, in itself, a more mature mental model of security than I had a year ago.