Reading Snort 1.7 source

Snort 1.7 is in beta. The release adds substantial new capabilities — better stream reassembly, new detection options, improved performance — and the architecture has matured in ways worth understanding. I have been reading the source over the past fortnight.

This post is the read-up. The internal architecture matters because the design decisions in Snort 1.7 are influencing how operational IDS works generally.

What the engine does, internally

A Snort sensor's processing pipeline, as the source reveals it:

Stage 1: capture. Snort reads packets via libpcap from a network interface. The interface is in promiscuous mode; Snort sees every packet on the wire (subject to the limits of the network medium and any switching).

Stage 2: decode. Each packet is parsed through layered decoders — Ethernet, IP, then TCP/UDP/ICMP, then application-layer headers if relevant. The decoder produces an in-memory packet structure with each layer's headers identified.

Stage 3: preprocessors. Preprocessors run on the decoded packet. The 1.7 preprocessors include frag2 (IP fragmentation reassembly), stream4 (TCP stream reassembly), http_decode (URL normalisation), rpc_decode, telnet_decode, portscan2 (port-scan detection), and bo (Back Orifice traffic detection). Each preprocessor can modify the packet, drop it, or generate an alert directly.

Stage 4: rules. The packet, now normalised by the preprocessors, is matched against the loaded rule set. Rules are organised into chains by protocol and direction; the matching engine walks the relevant chain looking for matches.

Stage 5: actions. A matched rule's action is applied — generate an alert, log the packet, drop it (in inline mode), or pass it (skip remaining rules).

Stage 6: output. Alerts are written via the configured output plugins. The plugins range from simple text logfiles to syslog, to database insertion, to more sophisticated alerting systems.

The pipeline is sequential within a packet but can be parallelised across packets in 1.7's improved threading model.

The preprocessor architecture, more carefully

The preprocessor system is where the 1.7 architecture is most interesting. Each preprocessor is a self-contained module with a defined interface:

void InitPreprocessor(struct _SnortConfig *sc, char *args);
void CallPreprocessor(Packet *p, void *context);
void CleanupPreprocessor(int signal, void *data);

A preprocessor registers itself for one or more event types — "packet decoded", "alert generated", "shutdown" — and is called when those events occur. The interface is small enough that writing a new preprocessor is feasible; several third-party preprocessors exist.

The most architecturally important preprocessor is stream4. It maintains state for every TCP connection seen on the network, reassembles streams, and feeds the reassembled bytes to the rules engine as a single coherent stream rather than per-packet fragments.

Reading the stream4 source has made me appreciate just how much state a serious IDS has to maintain. For every active TCP connection on the monitored network, the preprocessor tracks:

  • The connection's two endpoints (IP and port).
  • The current sequence numbers in both directions.
  • The receive windows.
  • Outstanding unacknowledged segments.
  • The set of expected next-segment options.
  • Reassembly buffers per direction.
  • A timer for connection expiry.

For a busy network with thousands of concurrent connections, this is a substantial memory footprint and a substantial amount of bookkeeping. The preprocessor's correctness — handling overlapping segments, retransmissions, asymmetric routing, and so on — is non-trivial. Reading the code is a useful exercise in appreciating why an IDS operating at line rate is genuinely hard.

The rule-matching engine

The rules engine in 1.7 is substantially smarter than earlier versions. The improvement: rules with a content match get organised into a fast-path search structure (essentially a multi-pattern matcher in the Aho-Corasick family) so that one pass through the packet payload can identify which content patterns match.

This changes the asymptotic complexity. Earlier versions were O(rules × payload) — every rule examined against every byte of every packet. The 1.7 engine is roughly O(payload + rules-with-matches), which scales much better as the rule set grows.

For my own deployment, with about 150 rules, the difference is not dramatic. For larger deployments — the published community ruleset is now over 1,000 rules — the performance improvement is the difference between line-rate operation and falling behind.

What this is going to change about IDS deployment

A few predictions, written down for future scoring.

Larger rule sets become operationally feasible. The performance improvement means deployments with several thousand rules are now viable on commodity hardware. The community ruleset will grow correspondingly.

Custom preprocessors will proliferate. The interface is clean enough that organisations will start writing their own preprocessors for their specific protocols and traffic patterns. Custom application-layer decoders for proprietary protocols are the obvious case.

Distributed sensor architectures become tractable. With faster sensors that can keep up with line rate, deploying multiple sensors at different points in a network and correlating their output becomes operationally feasible. The output-plugin work in 1.7 supports this — alerts can be sent to a central database, where cross-sensor correlation can happen.

The state-management cost will become a real factor. As preprocessors maintain more state per connection, the memory footprint of a sensor grows. Eventually this is going to drive specialised hardware — IDS sensors with dedicated memory architectures — for high-traffic deployments. We are not yet there, but the trajectory points that direction.

A few specific things I am going to do with 1.7

For my own deployment:

Upgrade once 1.7 is stable. The beta is good but I am not running it in production yet. I expect 1.7 final within a couple of months.

Write a custom preprocessor. I have a specific use case — detecting connections to a small set of internal services that should never be touched from the wider network — that does not fit the existing rules cleanly. A custom preprocessor that maintains the list of "internal services" and flags any external traffic targeting them is a clean expression of this. I will write it as a learning exercise.

Reorganise my rules with the new organisation in mind. The fast-path matching benefits from rules that have specific content strings. Rules without content (relying on flags or other matches alone) are slower. Reviewing my rules with this awareness will produce a faster ruleset.

Plug Snort output into my structured-log infrastructure. The 1.7 output plugins make this easier than it was. A single tool can then query Snort alerts alongside other operational data.

Why reading source matters

A short reflection. I have written about this before. Reading source is, dollar-for-dollar, the highest-leverage research activity I do. Snort 1.7's source has been particularly rewarding — the codebase is reasonably-well organised, the comments are present, the architecture is visible.

The alternative — using a tool without reading its source — works for casual deployment. For serious deployment, where you need to reason about edge cases, performance, evasion possibilities, and tuning, reading the code is the only reliable path. Documentation describes intent; source describes behaviour. The two are not always the same.

For anyone deploying Snort seriously: download the source. Read at least the preprocessor framework and one preprocessor in detail. The investment is a few evenings; the payoff is the ability to reason about what your IDS is actually doing rather than what the manual claims.

More as 1.7 stabilises and I get to use it in production. The next post will be on a topic I have been avoiding — Cisco PIX configuration.


Back to all writing