Detection content · Peter Bassill

I have spent the past six weeks writing the initial detection content for the Hedgehog SOC build, which is the most concentrated threat-intelligence work I have done in years and which has produced a sharper view of what the past three years of incidents actually look like operationally. The discipline of writing detection content is — in a way I had not fully appreciated until I was actually doing it for production use — the discipline of characterising adversaries by their methods rather than by their identity, and the methods are more uniform across very different adversary types than the press coverage suggests.

The thing that became visible over the past six weeks is how shallow the technical innovation pool actually is. Three years of incidents and the methods you have to write detection content for fall into roughly six categories. Phishing — almost always with weaponised office documents or browser-exploit links, sometimes preceded by social-media reconnaissance, increasingly accurate at impersonating real internal correspondents. Credential reuse and password-database attacks — the LinkedIn breach last month is the latest in a sequence stretching back through the 2009 RockYou dump and earlier, and the operational pattern is the same: an attacker downloads a dump of someone else's user-base, runs the credentials against your authentication endpoints, and walks in through the front door using a real user's password. SQL injection — still everywhere, as I wrote up in September, still finding its way into production code, still the entry vector for a meaningful proportion of public-spectacle dumps. Supply-chain compromise — RSA SecurID, DigiNotar, Symantec source, Flame's Microsoft Update path — getting more sophisticated and harder to detect from the customer side. Spear-phishing with privilege escalation through endpoint vulnerabilities — the standard APT shape, Aurora, Duqu, and now Flame, with the lateral-movement and exfiltration phases varying widely but the entry technique converging on weaponised email. And finally what I have been calling "the lateral motion that should have been impossible" — credentials reused into a SaaS administration interface (the HBGary Federal Google Apps pattern), or administrative credentials handed over an IM channel to a believed-Hoglund attacker (the rootkit.com incident from the same campaign), or a conference-call PIN obtained from an intercepted email invitation (the FBI conference-call leak). The category is messy because it covers a range of social-engineering and trust-violation patterns, but it is the category that has caused most of the post-breach embarrassment over the past three years.

What this means for detection content is that the rules I am writing fall into a smaller number of detection patterns than the threat-intelligence vendors' marketing suggests. There is a phishing-detection layer, looking for indicators in inbound mail and in clicked-link traffic. There is a credential-reuse-monitoring layer, watching for authentication patterns that suggest an attacker is testing dumped credentials against the authentication endpoint. There is a SQL-injection-detection layer, watching for the WAF-bypass patterns tied to the application stacks each client runs. There is a supply-chain-anomaly layer, watching for outbound connections to update-server infrastructure that do not match the patterns established as normal. There is an APT-shape layer, looking for the lateral-movement and persistence patterns that the post-spear-phishing phases produce. And there is a procedural-violation layer, which is the least technical of the lot — looking for unusual access patterns, unusual administrative actions, unusual cross-system credential usage that suggests a real human has obtained access they should not have.

The reason this consolidation matters is that it changes the shape of what the SOC analysts have to learn. The threat-intelligence vendor literature treats every campaign as essentially novel and requires the analyst to internalise a long list of indicators-of-compromise specific to each named threat. The reality I have been seeing while writing the detection content is that the underlying methods are repetitive enough that an analyst who understands the six categories well — really well, with the technical depth to recognise variants — can detect most of what shows up. The named-campaign indicators are useful as confirmation but they are not the primary detection signal. The primary detection signal is the methodological pattern. Hutchins, Cloppert, and Amin's "Cyber Kill Chain" paper from last year is the closest published framework I have seen to what I have been arriving at independently — they organise around the attack lifecycle (recon, weaponisation, delivery, exploitation, installation, command-and-control, actions) rather than around the six methodological categories I have been using, but the underlying argument is the same: defenders should think structurally about the adversary's method, not categorically about named groups.

This is going on the engagement-team training material I have been drafting. The structural argument is that we want analysts who can think about the adversary's methods rather than analysts who can recall a list of indicators. The training programme will cover the six categories with substantial worked examples from the past three years of public incidents, and the assessment will be on the analyst's ability to recognise variants of those categories rather than on their memorisation of named-threat indicators. This is not how most SOC analyst training in the UK is done; it is how I think it should be done, and the SOC build gives me the opportunity to test the proposition operationally.

The other thing this exercise has reinforced is how poorly the standard threat-intelligence subscriptions actually serve a SOC of the size we are building. The major commercial threat-intel feeds are optimised for very large SOCs with hundreds of analysts who can absorb thousands of named-threat indicators per day. A SOC of our shape does not need that volume of indicators; it needs a smaller, sharper set of detection content tuned to the actual threats the client estate sees. The cost-benefit of the major commercial subscriptions is therefore worse for a small SOC than the marketing suggests. We will run Emerging Threats for the open feeds and supplement with the limited commercial feed I evaluated in May; the substantial detection content will be locally written.

Verizon's annual Data Breach Investigations Report for 2012, which dropped in March, is the corroborating data point I keep coming back to: their finding that ninety-six per cent of breaches were not "highly difficult" to execute aligns with what I have been seeing in the engagement work. The defensive problem in 2012 is not that the adversaries are too sophisticated for us to defend against; it is that we have, collectively, not implemented defences against the unsophisticated attacks they are actually running.

The rumoured Saudi Aramco incident that several of my correspondents have been mentioning has, in the last three days, started showing the shape of something operationally substantial. I am holding off writing about it until there is publicly verifiable material; the next post may be that depending on what surfaces over the next fortnight.