The Copilot-for-security wave: what they actually do

Post 3 of the AI series. Microsoft Security Copilot, CrowdStrike Charlotte, SentinelOne Purple, Google Sec-PaLM — the wave of LLM-powered security assistants. What they actually do well, what they do less well, and how the framing reads against EmilyAI.

_Post 3 of the AI in cyber series._

The vendor Copilot for security category has gone, in eighteen months, from announcement to standard line item on most enterprise security RFPs. Microsoft launched Security Copilot in March 2023, became generally available in April 2024. CrowdStrike's Charlotte AI followed. SentinelOne's Purple AI too. Google's Sec-PaLM variant. A dozen smaller vendors with similar pitches.

This post is about what these products actually do, what they do well, what they do less well, and where the framing pulls in directions that I think security teams should be alert to. The comparison to EmilyAI is structural rather than feature-by-feature.

What the category is, in one paragraph

A security copilot is, in essentially every current implementation, a large language model with security-domain prompts, fine-tuning or retrieval-augmented generation on the vendor's security knowledge base, an integration into the vendor's own platform (SIEM, EDR, XDR), and a conversational interface where security analysts can ask natural-language questions, generate queries in the platform's native query language, summarise alerts and cases, write first-draft incident reports, and explore datasets. The product is, in essence, an LLM that knows the vendor's platform reasonably well. The pricing is on top of the platform subscription, often per-seat or per-query.

What they do well

Three things, fairly assessed.

Query translation. Translating show me all logons by service accounts in the last 24 hours, excluding the maintenance window into the platform's native SIEM query language is the single most universally useful capability of these products. Analysts who have not memorised the platform's query syntax (and most have not, because there are several to learn across the careers of most engineers) can move faster.

Alert summarisation. Reducing a long alert chain with twenty fields and a half-page of context into a paragraph that captures what happened, what looks unusual, what to check first is genuine cognitive aid. The first draft saves real time.

Knowledge-base navigation. The vendor's own documentation is, for the larger platforms, vast. A copilot that can answer how do I configure DMARC reporting in Defender for Office 365 without an engineer pulling open three browser tabs is, again, real saving.

These three capabilities, taken together, make the analyst's job marginally faster. Marginally faster, applied to the volume of alerts a busy SOC sees, is a real saving in human-hours. I have spoken to several security teams who have found the products genuinely useful at the analyst desk for these specific things.

What they do less well

Three things, also fairly assessed.

Verdict generation. Asking the copilot is this event malicious and using the answer as a verdict — rather than as an aid to the analyst's own decision — is not what the products are designed for and, in the implementations I have seen, not what they do well. The non-determinism point from the last post applies. The audit trail does not survive the question which model said what, when.

Cross-platform reasoning. Each vendor's copilot is, in practice, anchored to that vendor's platform. Reasoning across a Splunk SIEM, a Defender endpoint, a Mimecast email gateway, and an Okta identity provider — which is what a real SOC's day looks like — is not a thing any single vendor's copilot does cleanly. The architecture of copilot lives inside vendor's platform is exactly the architecture that does not solve the SOC's actual problem.

Tier-two triage. Doing the I have looked at this alert, I have considered the context, I have weighed the indicators, I have made a verdict, I have written the case work is not what these products do. They produce drafts, summaries, and queries. They do not produce decisions, they produce material that helps analysts make decisions.

The structural framing question

This is where the comparison to EmilyAI becomes interesting, and where I want to be careful about the framing.

The Copilot category is positioned as AI that augments your analysts. The framing is appealing because it sidesteps the harder question of AI that makes decisions. The augmentation is real; the analyst remains in charge; the audit trail belongs to the analyst, not the AI; the regulator can be told the same thing they have always been told, we have human analysts reviewing alerts, supported by tooling.

EmilyAI is positioned differently. She is not augmenting analysts; she is doing tier-two triage as her primary work, with human analysts above her for case validation, exception handling, and the cases she escalates. The framing is AI as the analyst, humans as the supervisors, rather than AI as the tool, humans as the analysts. The audit trail belongs to EmilyAI as a system, not to a human; the regulator has to be told something different.

Neither framing is wrong. They are different products solving different problems. The interesting question is which is the right framing for which customer, and the answer depends on the customer's analyst capacity, their alert volume, their regulatory posture, and their appetite for the operational characteristics that come with each.

A specific comparison: alert handling

To make this concrete. A customer with a SIEM that produces 50,000 alerts per day. Without AI, this volume requires a team of analysts whose primary work is filtering — most alerts will not be acted on. With a security copilot, the analyst's per-alert handling time is reduced; the team needs to be the same size, the work-per-analyst is faster, the unhandled alert volume drops slightly. With an EmilyAI-shaped system, the bulk of alerts are handled by the AI directly; the analysts handle the cases the AI escalates, supervise the AI's verdicts on a sample basis, and focus on the work that requires human reasoning. The team can be materially smaller, or the same team can cover materially more customers.

The copilot route is lower-risk operationally. The EmilyAI route is higher-leverage. Different customers reasonably make different choices.

What I think is going to happen

Three predictions, with the usual humility about predictions.

The copilot category will grow but commoditise. Within two to three years, every major security platform will have one. The differentiation will not be the copilot itself but the platform underneath. Customers will, in many cases, end up paying for the copilot because the platform is what they have, rather than choosing it on its own merits.

A category of AI that actually makes verdicts will emerge. Not from the current copilot vendors, who are structurally not set up to ship this, but from smaller domain-specialist firms and from incumbents who have been building purpose-specific systems for years. The verdict-making category will be smaller, more expensive, and harder to evaluate, because the questions to ask it (about determinism, audit, regulatory defensibility) are harder than the questions to ask a copilot.

The augmentation framing will be revisited. Within five years, the question should the AI make decisions will not be the same conversation it is in 2024. The answer will be domain-specific and regulator-shaped.

What this month looks like

If you are evaluating a security copilot, two practical recommendations.

One: identify the three analyst workflows you would expect to speed up. Test each in the trial period. Measure the time saving. The marketing claims are uniformly more impressive than the measured saving; both are usually positive.

Two: ask the deterministic-inference questions from the last post. The answers will tell you whether the vendor has thought about audit at production scale.

In six weeks: the hexagonal lesson. Why vendor agnosticism turned out to be a structural property worth designing for, and what that means for buyers looking at multi-platform AI deployments.