I helped a friend's organisation with a small adversary-emulation engagement this month. The terms of the engagement: I would attempt to compromise their network from the outside, with limited prior information; they would respond as if it were a real incident; we would meet weekly to discuss findings.
The organisation is mid-sized — about 200 staff, mixed Windows and Linux, multiple sites. The findings were largely predictable. The process was instructive in ways I want to write about.
The engagement structure
A red team engagement is not a penetration test. The penetration test asks "can we be compromised?". The adversary emulation asks "how does our team perform when we are compromised?".
For this engagement:
Initial reconnaissance. I spent three days collecting publicly-available information about the organisation — their public website, employee names from LinkedIn-like services, their visible IP ranges. The information was substantial; an attacker investing this kind of time has a meaningful starting position.
Initial compromise. I sent a phishing email to a small set of employees. Two opened the attachment (a benign trojan-mimic). One enabled the macro that gave me a foothold on their workstation.
Lateral movement. From the foothold, I spent two weeks moving laterally — collecting credentials, finding interesting hosts, escalating to administrative access on key systems. The movement was deliberately slow to test whether the defenders would notice.
Exfiltration. I pulled a small set of "sensitive" documents (chosen to be valuable to the organisation) and exfiltrated them through a covert channel.
Detection assessment. Throughout, I tracked which of my actions the defenders detected. The answer was: very few.
What the defenders detected
From about 60 distinct actions across four weeks:
- 12 generated alerts in their existing monitoring.
- 4 of those alerts were investigated.
- 1 of the 4 investigations correctly identified the activity as suspicious.
- 0 led to detection of the active engagement.
The single correctly-identified alert was for an unusual outbound DNS query — too specific a pattern for the on-call analyst to ignore. They escalated; the senior analyst dismissed it as a likely false positive (which is, in fact, what the alert often is).
The pattern: alerts existed; alerts were generated; alerts were not investigated effectively; even the correctly-investigated alert was dismissed.
What this teaches
Three things.
The detection-to-response gap is the killer. The organisation had decent monitoring infrastructure. Many of my actions did produce alerts. The alerts were not turned into response. The infrastructure investment was largely wasted because the people-and-process layer was missing.
Familiarity-with-noise is dangerous. The senior analyst who dismissed the unusual DNS alert had presumably seen many false positives that looked similar. The dismissal was reasonable from their perspective. The signal of a real incident was hidden in the noise of routine alerts.
The detection metrics need to include response. Counting alerts is easy and misleading. Counting investigations is better but still misses the right question — which is how often does an alert lead to a correct conclusion? This is a much smaller number, and it is the one that matters.
What the organisation has changed
Following the engagement debrief, three changes:
Reduced alert volume. They have systematically tuned their monitoring to reduce the false-positive rate. Less noise; the senior analyst's dismissal heuristic now produces fewer false dismissals.
Improved escalation. Specific alert types now trigger a defined escalation path rather than a judgement call. The senior analyst's role is supplemented by a procedural backstop.
Practice scenarios. Quarterly tabletop exercises where an attack scenario is described and the team walks through the response. The first one was awkward; subsequent ones have been improving.
These are the forensic-readiness disciplines I have been writing about, made concrete by an actual engagement.
What I am taking from this
For my own writing: the value of red-team engagements is in the response data they produce, not the compromise data. The compromise is usually predictable; the response is the variable.
For friends running organisations: a small engagement of this kind, even informal, is enormously informative. The cost is modest; the visibility into the operational reality of the defence is substantial.
For my own work: I am thinking about whether to formalise this kind of engagement as a service for the small organisations I help informally. The market is real; the existing options are mostly large-vendor products that do not fit small organisations' budgets. A modest, careful, small-scale offering could be useful.
More as the year develops.