BIND keeps biting: the NXT vulnerability and the question of what 'patched' means

Another BIND advisory landed last week. This one — a buffer overflow in the handling of NXT records — is bad: it is remotely exploitable, it does not require any authentication, and proof-of-concept code is being passed around on the relevant lists. ISC released BIND 8.2.2-P5 over the weekend.

I have been watching how operators are responding. The picture is uncomfortable, and the lesson is something I had not appreciated until this week: "patched" is not the boolean condition we treat it as.

What the NXT bug is

DNS records have types — A for IPv4 addresses, MX for mail exchanges, AAAA for IPv6 addresses, and so on. NXT ("next") is a less commonly used type, introduced for the DNSSEC security extensions, that asserts the existence of a contiguous range of names in a zone.

BIND's handler for NXT records, when processing a response that contains an NXT, builds an in-memory representation of the record. The buffer it allocates for the name component of the NXT record is fixed-size. A malicious response that includes an NXT record with a sufficiently long name overflows the buffer.

If the buffer is on the stack, the overflow corrupts the return address. Carefully constructed payload allows arbitrary code execution. Since BIND typically runs as root (or, in best practice, with the network privilege to bind to port 53), code execution gives the attacker substantial control over the host.

This is the same shape of bug as the wu-ftpd vulnerabilities I wrote about three months ago: unchecked input length, fixed-size buffer, memory corruption, code execution. The pattern is depressingly common.

What 'patched' actually requires

Here is where the picture gets uncomfortable. "Patched" is not a single thing.

For a host running BIND 8.2.2-P5 (the patched version), the bug is fixed. The attacker cannot exploit it. This is the simple case.

For a host running BIND 8.2.2 (one revision older), the bug is exploitable. The operator needs to upgrade.

For a host running BIND 8.2.1, the bug is also exploitable, plus several others previously fixed, plus possibly more not-yet-disclosed. Older versions are not just "missing the latest patch"; they are an accumulating bundle of vulnerabilities.

For a host whose operator believes they patched but in fact patched the wrong copy of BIND — e.g. they have two installations and only updated one — the bug is exploitable.

For a host whose operator correctly upgraded but did not restart named after the upgrade, the bug is exploitable. The new binary is on disk; the running process is the old one.

For a host whose operator did all of the above correctly but is running it inside a chroot whose libraries are also outdated, the bug might still be exploitable, depending on what the exploit needs from the libraries.

For a host running a vendor-supplied BIND from a Linux distribution that has not yet shipped a patched version, the operator is dependent on the vendor's response. If the vendor patches in two days, the operator can be patched in three. If the vendor takes two weeks, the operator is exposed for two weeks regardless of their own diligence.

The word "patched" abstracts over all of this. In practice, none of these conditions are uncommon.

What the survey of my friends shows

I spent an evening this week informally checking with operators I know — about fifteen people — what version of BIND they are running. The results, in rough proportions:

Three are on 8.2.2-P5: properly patched.
Six are on 8.2.2 unpatched: aware of the vulnerability but have not yet upgraded. Most cited needing a maintenance window.
Two are on 8.2.1: had not upgraded since that advisory and did not realise they needed to.
Two are running ISC's binary on top of a vendor distribution and were not sure which copy of BIND was actually being used.
One claimed to be patched but, on inspection, had updated the source tree without rebuilding and reinstalling. The running daemon was the old one.
One was on a build older than the conversation could productively explore.

Nine of the fifteen — sixty per cent — were exposed at the moment we spoke. Most of them are competent operators. They are not lazy and they are not stupid. The complications above had each individually contributed.

What this means for the threat model

A few things, written down for my own future reference.

The exploitation window for any given vulnerability is wider than the patch availability suggests. From the moment a patch is released, there is a window during which most installations are still vulnerable. The window is not measured in days; it is measured in weeks. Sometimes months.

Attackers know this window. Mass scanning for the previous month's vulnerabilities is a standard activity in the wild. The data point is that patched two weeks ago is barely better than not patched at all, in terms of the chance of being scanned and probed.

The patching pipeline is not just "apply the patch". It is: notice the advisory; decide it applies; obtain the patch; build/test it; schedule the maintenance window; apply it; verify the running version; verify any chrooted or jailed copies; update monitoring to confirm the new version. Each step has failure modes. The end-to-end success rate is not a hundred per cent on any specific deployment.

Auditing your own patch state is a separate discipline from patching. Knowing that BIND is patched on a host is not the same as observing that BIND on the host is patched. The audit needs an independent measurement — dig version.bind chaos txt is a fine way to check; so is reading the running process's banner.

What I have changed

For my own infrastructure:

A nightly cron job runs dig against my own DNS servers and emails me if the version response is anything other than what I expect. This is cheap and would have caught the "updated source but did not restart" case.

A separate audit list of every package on every host I run, with their current version and the latest available, refreshed weekly. This is more work than I had been doing and is, I think, the minimum sensible discipline for a production setup.

A change-log discipline for every host: every package update, with date, version, and reason, in a per-host file. This makes it possible, three months from now, to answer the question "when did we patch this" without guessing.

None of these are revolutionary. All of them are the unglamorous work of keeping infrastructure trustworthy. The interesting thing about the BIND incident is that it has made me more aware of how much I had been getting away with — running infrastructure that was, in some real sense, less reliably patched than I had been telling myself.

I suspect the same is true of most operators most of the time. The problem is not that we do not know how to patch. The problem is that the gap between knowing and reliably doing is bigger than we think.