nmap -O: how OS fingerprinting actually works

I have been using nmap's -O flag for OS fingerprinting for over a year. Until this week I had only the vague mental model: "send some weird packets, watch the responses, match against a database". This is broadly correct. The actual technique, when you read the implementation and the Phrack 54 paper Fyodor wrote on it, is more clever and more subtle than the slogan implies.

This post is the writeup.

What stack fingerprinting exploits

The TCP/IP RFCs specify how implementations should behave under normal conditions. They are largely silent or ambiguous about how implementations should behave under abnormal conditions: packets with unusual flag combinations, packets to closed ports with unexpected options, packets that violate constraints in subtle ways.

Different operating systems' TCP/IP stacks have, over the years, made different choices about how to handle these edge cases. The choices were made independently, by different developers, at different times, often based on whatever the developer found convenient. The choices are consistent within an OS and different between OSes.

This is the substrate of fingerprinting. A small number of carefully-chosen abnormal probes produces a response signature that is essentially unique per OS-version. The signature can be matched against a database to identify the OS.

The specific tests nmap uses

Fyodor's approach uses about a dozen distinct tests. The ones I find most interesting:

Test 1: Initial sequence number generation. When a host responds to a SYN, it picks an initial sequence number for its own side of the connection. The choice of ISN is, in different OSes, made by very different algorithms. Some use a counter that ticks at fixed intervals. Some use a random source. Some use a cryptographic mix. The pattern of ISNs over multiple connections — their increments, their distribution, their correlations — is highly OS-specific.

nmap probes this by opening several connections in quick succession and measuring the differences between consecutive ISNs. The pattern alone narrows the OS to a handful of candidates.

Test 2: TCP options ordering. When a SYN packet is received, the responding SYN-ACK includes a set of TCP options — MSS, window scaling, selective acknowledgement support, timestamps. The order in which these options appear is implementation-dependent. There is no spec mandating any particular order. Different OSes use consistently different orderings, which means the ordering alone narrows the field.

Test 3: Don't-Fragment bit handling. When a host receives a TCP packet with the DF bit set and the response would exceed the local MTU, it must either fragment somehow or send back ICMP. The exact behaviour is implementation-defined. Some send ICMP; some fragment in the upper layer; some drop silently. The behaviour identifies the family of the OS.

Test 4: Window size. The TCP receive window size advertised in the response is implementation-dependent and surprisingly consistent within an OS version. Linux 2.0 typically advertises one value; 2.2 advertises another; FreeBSD advertises a third; each Windows version is distinct. The exact integer is a fingerprint.

Test 5: ACK number on closed-port RST. When you send a packet with the FIN flag set to a closed port, most OSes respond with an RST. The ACK number in that RST is, by some OSes, set to the FIN packet's sequence number plus one (treating the FIN as legitimate). By other OSes, it is set to zero or to some other value. The choice is OS-specific.

Test 6: TCP options handling on closed ports. When you send a SYN to a closed port, some OSes echo back any TCP options you included in their RST response. Some do not. Some echo only certain options. The pattern is diagnostic.

Test 7: ICMP error message quoting. When sending an ICMP error in response to an unexpected packet, the spec requires the ICMP message to include the IP header plus 64 bits of the offending packet. Some OSes include exactly that; some include more; some include less. Some pad with junk; some pad with zeros. The exact bytes returned are diagnostic.

Why this works as well as it does

Individually, no test is decisive. Multiple OSes might handle ISN generation similarly. Several share a TCP options ordering. The Don't-Fragment behaviour collapses to a handful of categories.

The combination is what produces the unique signature. nmap runs all of the tests, builds a tuple of the results, and matches the tuple against the fingerprint database. The tuple is much more specific than any single test.

This is the same property that makes structured logs work: the join of multiple weak signals is much more informative than any single signal.

What this means for defenders

A few implications.

You cannot easily hide your OS from a determined fingerprinter. The behaviour being measured is structural to the kernel's TCP/IP stack. Changing it requires kernel patches. There are kernel patches that do this — Linux's IP Personality is one — but they are non-trivial to deploy and have other consequences.

The information leaked has real value to attackers. Knowing the exact OS version of a target tells an attacker which exploits to try. Linux 2.0 has different vulnerabilities than 2.2; the difference matters when an attacker is choosing what to throw at you.

The defensive posture should assume fingerprinting. Do not rely on "they don't know what we run". They do, or they can find out cheaply. Plan accordingly.

Fingerprinting is itself detectable. The probe sequence is distinctive — abnormal packets to specific port ranges, in specific orders. Snort rules for nmap fingerprinting are part of the standard ruleset and fire reliably on real attempts. Detection does not stop the fingerprinting (the data leaks before the alert fires), but it does tell you that someone is interested.

What I have actually changed

Knowing the technique has changed two things on my own infrastructure.

First, I now run nmap fingerprinting against my own hosts as part of the periodic self-assessment I do. The output tells me what an external attacker can determine about my hosts. The output is, broadly, accurate — nmap correctly identifies my Linux 2.2 boxes — and that is itself the useful information: the attacker can know what I am running.

Second, my Snort configuration now has rules for the specific probe patterns nmap uses. The classic indicators — the FIN-to-closed-port probe, the multiple-SYN ISN-sampling sequence, the TCP options test — all generate alerts. These alerts are not actionable in the sense of being able to block the fingerprinting (the probes have already reached me by the time the alert fires) but they are diagnostic in the sense of telling me when someone is taking an active interest.

The general lesson

The specific point is about TCP/IP fingerprinting. The general point is that any system rich enough to be useful exposes information about itself through its responses to abnormal input. The information leakage is not a bug — it is a property of the system being deterministic and rule-following.

This means defence in this category is not about preventing the leakage. It is about making the leakage less useful. If the population of OS versions is large and diverse, fingerprinting tells the attacker less. If the population is concentrated, fingerprinting is essentially perfect identification. The shape of the defence is, again, platform diversity.

The more I work in this discipline, the more I see the same general pattern recurring. Diversity is operationally costly and structurally protective. Homogeneity is operationally cheap and structurally fragile. The trade-off is real and is largely paid by entities other than the operator who chooses.