Detection engineering: from rule writing to engineering discipline

Most SOCs I have walked into have a detection problem they would describe as a tooling problem. The analysts will tell you the SIEM is too slow, the EDR is too chatty, or the rules are too rigid. Sometimes those things are true. Far more often, the real problem is that no-one is engineering the detections — they are accumulating them, like fossils in a riverbed.

What detection engineering actually is

Detection engineering is a small, deliberate idea: treat each detection as a unit of code with a job description, an owner, and a test suite. That sentence sounds dry, but the consequences are profound. It means that for every rule on your platform you can answer four questions without flinching: what is this meant to catch, what data does it depend on, what would prove it works, and how would we know it has stopped working? Most rules in most SIEMs cannot answer any of those.

The discipline is borrowed from software engineering, and it borrows the same artefacts. Detections live in version control. Each one ships with positive samples — events that should make it fire — and negative samples — events that should not. There is a continuous integration pipeline that runs the suite on every change, on every platform upgrade, and on every new schema. There is a retirement process. There is, crucially, a definition of done that is not the rule was deployed but the rule is firing on what we expected and not firing on what we did not.

Why most detection programmes drift

Drift starts the moment a rule is written without a test, because the writer is the only person who understands what it was for. Six months later that person has moved teams; eighteen months later, they have left. The rule is still there. It still fires. Nobody disables it because nobody is sure what it is for, and the disable button feels heavier than the keep button. Multiply that by a few thousand rules and you have the SIEM most of us have inherited at some point in our careers.

The other source of drift is the data layer. A schema changes, a log source moves, a parser is updated, and silently the rule stops firing on real events while still firing on synthetic ones. Without tests against production-shaped data, this kind of failure is invisible until something bad happens and the rule that should have caught it did not.

The first six things to put in place

If you are starting from scratch, or starting from the rubble of someone else's detection programme, the order I tend to follow is roughly this. First, get the rules into version control, even if all that means initially is a nightly dump from the SIEM. Second, give every rule an owner — a name in a field that the platform respects. Third, add a one-line description that says what the rule is meant to catch, in English, not in detection-platform jargon. Fourth, capture at least one positive sample per rule, even if it is hand-crafted. Fifth, write a small harness that replays the samples and asserts the rule fires. Sixth, add a CI step that runs the harness on every change.

None of this is glamorous. None of it solves an active incident. But on the day you have an active incident, having those six things in place is what lets you say with confidence which detections you can trust, and which you cannot.

Coverage as a map, not a scoreboard

It is tempting, once you have a tidy rules repository, to map it to ATT&CK and produce a heatmap. Heatmaps are useful as maps and dangerous as scoreboards. They show you where you have detection logic; they say almost nothing about whether your logic actually defeats a real adversary on your real estate. A bright green technique on a heatmap can hide a detection that has not fired in eighteen months because the underlying log source moved.

I would rather have ten detections I can prove work than a hundred that look good in a screenshot. The hundred is a target the team can chase forever; the ten is a foundation the team can grow from.

A note on AI-generated detections

There is currently a great deal of enthusiasm for letting language models write detections from natural-language descriptions of TTPs. The output is often syntactically pleasant and semantically wrong in ways that take a tired analyst a long time to spot. If you do this, treat the model the way you would treat a cheerful intern: take its work, strip it down, write the test cases yourself, and only ship what survives the harness. The discipline is what makes detection engineering work, and it is the discipline that the model cannot give you.

Where this leads

A mature detection programme stops feeling like a fight against the SIEM and starts feeling like the slow, deliberate accumulation of capability. You can answer hard questions quickly: do we catch this technique, on this data source, on this estate? You can retire confidently. You can introduce new platforms without losing what you had. Most importantly, the analyst at three in the morning has a fighting chance of understanding why the alert in front of them fired.

That last bit, in the end, is the entire point.