Snort rule-writing: what a fortnight of trying has taught me

I have spent the last fortnight writing my own Snort rules. The default ruleset that ships with Snort is small, well-curated, and fine to start with — but it is general. The whole point of an in-house detection capability is that you tailor the rules to your own environment. Writing my own has taught me three things that the documentation does not quite say.

The rule language is small. The discipline is large.

A Snort rule has, in its current form, five major elements: an action (alert, log, pass, etc.), a protocol, a source/destination address-and-port, an arrow (direction), and a parenthesised list of options.

A simple rule:

alert tcp any any -> $HOME_NET 80 (msg:"WEB-CGI phf access"; content:"/cgi-bin/phf"; nocase;)

In English: alert on any TCP packet from any source to any port-80 service in my home network whose payload contains /cgi-bin/phf, case-insensitive.

You can read the entire rule language reference in an afternoon. The hard part, after that, is the rule content — picking the right substring to match on, deciding what variants to cover, working out which traffic should or should not trigger.

Lesson one: every rule needs a negative test

My first version of every rule fires on the attack I care about. Good. The second version of every rule, written about an hour later, has been narrowed because the first version also fires on something legitimate I did not anticipate.

The canonical example: I wrote a rule looking for /cgi-bin/handler in HTTP requests. This is a standard probe for a particular known-vulnerable CGI. The rule fired. It also fired several times an hour against my own web server, because I had a script in /cgi-bin/handler that I had written for an unrelated reason. My logs were full of "alert: my server is being attacked" alerts that were, in fact, me reading my own page.

Fix: scope by destination. Add !$MY_SERVER to the destination, or — more usefully — use a flow keyword to require that the request was followed by a real exploit pattern, not just the URL.

The deeper version of this lesson: a rule without a negative test is a rule with an unknown false-positive rate. Until you have walked the rule against a corpus of normal traffic and seen what it catches, you do not know whether it is going to be useful or just noisy.

Lesson two: rule order matters in non-obvious ways

Snort processes rules in the order they appear in the configuration. The first rule to match a packet logs the alert; subsequent rules can either also match (if you ask) or be skipped. This means a generic rule that comes before a specific one will swallow packets that the specific one would have caught.

I hit this with two CGI rules. The generic one — "any access to /cgi-bin/" — matched first. The specific one — "the formmail.pl exploit" — never got a look-in, because the generic one had already taken the packet.

Snort handles this with the pass action and with rule-set ordering. The right architecture is: most-specific rules first, generic catch-all rules last. This is the inverse of how most people instinctively organise rule files (they tend to write the broad cases first because those are the obvious ones), and it is worth being deliberate about.

Lesson three: write rules in pairs

For every "this looks suspicious" rule, I now also write a corresponding "this is the normal version" rule. The normal-version rule is set to pass so that it overrides the suspicious version when both apply.

The motivation is structural. If I tune a rule by adding negative conditions to the alert version, those conditions are buried inside the option list and easy to miss. If I write a separate pass rule for the legitimate case, it is a separate, named, reviewable thing. When the legitimate case changes, I update the pass rule. The alert rule never has to change.

This is the same discipline you eventually arrive at in any matching system — separate the catch-everything-suspicious case from the explicit-allow-list case, so that each can be maintained independently.

A worked example: phf revisited

Here is the rule I started with two weeks ago:

alert tcp any any -> any 80 (msg:"PHF access"; content:"phf";)

This fires on the substring phf anywhere in any TCP packet to any port 80. Anywhere. Including the word phpfpm if it ever appears, including the URL /staff/phf-employee-list.html on a perfectly legitimate website. Awful rule.

Here is the rule I have now:

alert tcp $EXTERNAL_NET any -> $HTTP_SERVERS $HTTP_PORTS \
    (msg:"WEB-CGI phf access attempt"; \
     flow:to_server,established; \
     content:"/cgi-bin/phf"; nocase; http_uri; \
     classtype:web-application-attack;)

Differences:

Bound to $EXTERNAL_NET -> $HTTP_SERVERS:$HTTP_PORTS, not "anywhere to anywhere on 80".
Requires an established TCP flow with the request going to the server. This drops a lot of noise from scan traffic.
Matches the substring inside the URI, not anywhere in the payload. The http_uri modifier is a recent addition I am still getting used to.
Case-insensitive, so /CGI-BIN/PHF is also caught.
Has a classtype so the alert can be aggregated meaningfully.

The rule is longer and harder to read. It is also, I think, an order of magnitude better in production.

What I am writing rules for now

A short list of categories I am working through, in priority order:

CGI exploit attempts. The Bugtraq archive is full of them. Each needs its own rule.
Failed authentication patterns. SSH, Telnet, POP3 — anything where a brute-force shows up as repeated failures.
Scan footprints. nmap's various scan modes have characteristic packet shapes.
Specific known-bad payloads. Buffer-overflow shellcode for popular targets.

The last category is the most fragile, because attackers can vary the shellcode trivially. The first three are sturdier and more useful in the long run.

Next time, the same exercise but for outbound traffic — which is, I am realising, where the more interesting alerts live.