Snort plugins and the rise of the preprocessor

Snort has been quietly evolving over the last six months. The headline progress is the size of the rule library — there are now several hundred community-contributed rules, well organised — but the more interesting development is the preprocessor architecture, which is where the engine moves from being a pattern-matcher to being a stateful protocol-aware analyser.

This matters for a specific reason: pure pattern-matching is easy to evade. Preprocessors close most of the easy evasion routes.

What evasion looks like

A naive Snort rule for an HTTP attack looks like:

alert tcp any any -> any 80 (content:"/cgi-bin/phf"; ...)

This fires on any TCP packet to port 80 containing the literal substring /cgi-bin/phf. An attacker who wants to evade this rule has many options.

TCP fragmentation. Send the request in many small TCP segments. The IDS sees: GET /cgi-, then a separate segment bin/, then phf. The pattern is split across packets. The naive matcher does not reassemble the stream and so does not match.

IP fragmentation. Send the IP packets fragmented at the IP layer. The IDS sees the fragments individually rather than the reassembled packet. Patterns that span fragment boundaries are not matched.

HTTP-level encoding. Encode the URL: %2fcgi-bin%2fphf. The matcher sees the encoded form. The web server decodes it and treats it identically. The attack succeeds; the IDS misses it.

Path manipulation. /.//cgi-bin/.//phf is equivalent to /cgi-bin/phf after path normalisation, but does not match the literal substring.

HTTP method tricks. POST instead of GET if the rule was tied to GET. HEAD instead of GET to probe the server while evading rules that watch only GET.

All of these are documented evasion techniques. Several of them have been published in detail by Ptacek and Newsham in their famous 1998 paper. They are, by now, well-known to attackers and to defenders.

What a preprocessor does

A preprocessor in Snort is a module that runs before the rule-matching engine. Its job is to put the packet stream into a normalised form, so that the rules see traffic as the destination would see it after all the protocol layers have done their work.

The preprocessors that have shipped in recent Snort releases include:

stream (or its successor stream4). Reassembles TCP streams. The matcher now sees the full HTTP request as a single byte sequence, regardless of how the attacker fragmented it at the TCP layer.

frag (or its successor frag2). Reassembles IP fragments. The matcher sees the reassembled IP packet, not the fragments.

http_decode. Decodes URL escaping. %2fcgi-bin%2fphf is normalised to /cgi-bin/phf before matching. Optionally also normalises path manipulation tricks (the /./ and // redundancies).

portscan. Looks for the pattern of a port scan across many alerts and produces a single high-level alert instead of many low-level ones.

telnet_decode and rpc_decode. Equivalent decoders for protocol-specific encoding.

With these in place, a rule that says content:"/cgi-bin/phf" will fire whether the attacker sends:

The literal string in one packet.
The string fragmented at the TCP layer.
The string fragmented at the IP layer.
The string with URL encoding.
Various combinations of the above.

The rule is unchanged. The preprocessor takes care of the variants.

Why this is harder than it sounds

The naive description of preprocessors makes them sound straightforward — "just reassemble streams before matching". The actual implementation is harder because the IDS has to make exactly the same decisions as the destination would.

For example, what does the destination do with overlapping TCP segments? If two segments cover the same byte range with different content, does the destination keep the first or the second? The answer depends on the operating system. Windows differs from Linux; Linux 2.0 differs from Linux 2.2; FreeBSD differs from Solaris.

If the IDS reassembles overlapping segments differently from the destination, it sees a different byte stream. An attacker who knows the destination's behaviour and the IDS's behaviour can craft segments that produce one stream when seen by the IDS and a different stream when seen by the destination. The malicious payload appears in the destination's view but not in the IDS's. The attack succeeds; the IDS sees nothing matching its rules.

This is the core of the Ptacek-Newsham insertion attack. It has not gone away. The current generation of Snort preprocessors handle the common cases — they default to the most common reassembly behaviour — but the underlying problem is fundamental. Any IDS that does not run on the destination machine itself is vulnerable to evasion through stack-fingerprint mismatches.

What I have done with my own setup

A fortnight of experimenting:

I have enabled stream, frag, and http_decode. The first two had no measurable effect on alert volume — the attacks I was already seeing were not using fragmentation. The third produced an immediate increase: rules that had been generating two or three alerts a day started generating ten or twenty, because attacks I had been silently missing were now being detected.

The extra alerts were almost all true positives — actual probes and exploit attempts — that had been getting through the matcher because of trivial encoding. None of the attackers, on the available evidence, were using sophisticated evasion. They were using the simple URL-escaping that comes with most HTTP clients automatically.

This is, in itself, an interesting data point. The evasion sophistication of the attackers hitting random IPs is low. Preprocessors make a meaningful difference at this baseline. Sophisticated targeted attackers would still evade me, but they were going to evade me anyway.

What this points to

The wider lesson is that an IDS is not a single component. It is a stack: link-layer capture, protocol normalisation, pattern matching, alert handling. The pattern-matching layer gets all the attention because that is where the rules live. The normalisation layer is at least as important and gets very little attention.

This is going to change. As Snort's preprocessor library grows, and as commercial IDS products mature, the centre of gravity in the discipline is going to shift. The interesting problems will be in protocol modelling — how exactly do real implementations of HTTP/TCP/IP behave under edge cases — rather than in pattern writing.

The academic intrusion-detection community has been working on this for years. The operational community is just starting to catch up.

A practical note for rule writers

With preprocessors in place, you can — and should — write rules against the normalised form of the traffic, not against the wire form. Rules that try to match URL-encoded variants are now obsolete; the preprocessor has decoded the URL. Rules that try to match across fragment boundaries are now obsolete; the preprocessor has reassembled.

The rule library is being updated to reflect this. If you are still maintaining rules that include manual encoding of common URL escapes, you can simplify them once you have the preprocessor enabled.

The simpler rule is also easier to maintain. The preprocessor is doing the work. The rule is doing the meaning.