Agents in production, eighteen months on

Post 13 of the AI series. The agent demos at RSA and Black Hat have got slicker. The agent in production cyber operations has, mostly, not arrived. The honest 18-month read on a category whose marketing has run ahead of its engineering.

_Post 13 of the AI in cyber series._

Eighteen months ago I wrote post 8, a piece that argued the agent demos of 2024 were ahead of the agent deployments. The eighteen months since have produced more demos, more polished demos, and a small but real population of agents shipping in production. The category has matured. The honest read of what is in production is, however, a long way from what the marketing claims.

This is a status post rather than a structural one. What has actually shipped, what is shipping now, where the boundary is between demo and deployment, and how the constrained-agency shape has held up as the conversation has matured.

What has actually shipped

I have categorised the public claims I have seen this year into four buckets, by deployment maturity.

Bucket one: shipped at scale, in production, with consequential authority. This is the smallest bucket. I count, by name, perhaps five products that fit this description in the cyber security category. The shape of all five is the constrained-agency shape from post 8: bounded action vocabulary, deterministic decision-making at the consequential layer, human-supervised correction loop. None of them is the unbounded operator agent that the 2024 demos suggested. EmilyAI is one of the five.

Bucket two: shipped at scale, in production, with augmentation authority only. A larger bucket. Microsoft Security Copilot, CrowdStrike Charlotte, SentinelOne Purple, and a long list of vendor copilots that have crossed from announcement to operational deployment over the past year. The agency is bounded to propose, summarise, query, narrate. Consequential actions remain in human hands. The deployment is now common; the framing is augmentation, not agentic, but the marketing has been increasingly using the latter word.

Bucket three: shipped in pilot, with consequential authority bounded by IAM. A smaller bucket of products where the vendor has shipped an agent that can take consequential actions in the customer's environment, but where the customer's identity and access controls have been used to limit what those actions can do. Pilots; not yet at the deployed-to-the-customer-base scale. The deployment posture is we will let the agent do X within the scope of permissions we have given it; we have given it permissions that limit the blast radius. This is a defensible posture; it is also, in my view, the place where most of the practical learning of the next year will happen.

Bucket four: demos. The largest bucket. Products that have been announced, demoed at conferences, written up in trade press, and are not yet in production at customers with consequential authority. The demos have improved. The gap between demo and deployment has, in eighteen months, narrowed less than I would have predicted from the trajectory implied by the demos themselves.

The shape of the gap

Three patterns I see consistently when an agent moves from bucket four to bucket three.

The action vocabulary gets bounded. The agent that can do anything in your environment gets restricted, by the customer's procurement team, to a specific list of actions the customer has decided are acceptable. The agent now does fewer things; the things it does are the ones that have been thought about specifically.

The autonomy gets phased. The agent starts in shadow mode (decides what to do, does not actually do it, logs the proposed action). Moves to recommend mode (proposes the action to a human who decides). Moves to ratify mode (takes the action subject to a human's positive confirmation). Moves to unsupervised within scope. The progression takes months per stage.

The audit obligation gets engineered. The customer's compliance team requires a clean audit trail of what the agent did, on what input, with what reasoning. The vendor has to retrofit this if it was not designed in. The shape that ships is the shape that survived this audit retrofit.

The shape that survives all three of these — bounded action vocabulary, phased autonomy, engineered audit obligation — is the constrained-agency shape. The unbounded-agency demos do not make it through.

The customer-side learning

A separate observation that I have been hearing consistently in customer conversations through 2025.

The customer-side concerns about AI taking actions in our environment are not, in my experience, primarily concerns about the AI's reasoning quality. The customers I work with broadly trust that current-generation LLMs are good enough at most security reasoning tasks. The concerns are about the operational properties around the reasoning: the audit trail, the rollback path, the blast radius, the regulatory defensibility.

This is, I think, the lesson the AI security category is learning the hard way through 2025. The reasoning quality is not the bottleneck. The operational discipline around the reasoning is. The vendors who have been thinking about this for years are now finding their concerns aligned with where the customer-side conversation has arrived. The vendors who built the reasoning capability first and the operational discipline second are still doing the discipline work.

What changes in the second half of 2025

Three things I expect to land between now and the end of the year.

The constrained-agency shape becomes the named pattern. The term constrained agency (or some near equivalent) will be used by the major analyst firms, by the regulators, and by customer security teams as the named alternative to unbounded operator agent. The naming will accelerate the procurement-grade differentiation between the two shapes.

The regulatory conversation specifically about agent action authority gets formalised. The UK Cyber Security and Resilience Bill and the EU's AI Act implementation work will, in their secondary legislation, address what an autonomous agent in a regulated environment is permitted to do without human ratification. The shape this takes will matter for the next decade.

A high-profile agent-driven incident lands. I have been predicting this through this series; it has not happened yet. A bucket-three deployment will, somewhere, have a consequential failure that becomes public and reshapes the conversation. The shape of the failure is impossible to predict; the fact of it is, on the trajectory we are on, increasingly likely.

Where EmilyAI fits in the bucket map

For the avoidance of doubt. EmilyAI is in bucket one. She has been there since the platform reached scale in 2020, has remained there through six years of customer deployment, and the architectural properties — bounded action vocabulary, deterministic inference, continuous learning from analyst feedback, audit-grade trails on every verdict — are the ones that have allowed her to stay there without incident.

The platform's roadmap through the rest of 2025 does not move her into bucket two or three. She continues to do what she has been doing. The interesting work is integrating her with the bucket-two and bucket-three agents that her supervising human analysts are increasingly using. The reasoning models from post 7 and the operator-style agents from post 10 sit above EmilyAI in the SOC workflow, used by the humans who supervise her escalations. The boundary is clean and the architecture supports it.

What is next

In six weeks: a deeper piece on the regulatory defensibility of AI-driven verdicts. The Cyber Security and Resilience Bill drafting work is producing some specific expectations; the determinism property from post 2 is increasingly turning out to be the load-bearing one. What that means for procurement decisions in the second half of 2025.