Agentic AI, year one: the demo vs the deployment

Post 8 of the AI series. AI agents in cyber operations have been demoed everywhere this year. The agent that actually ships looks different from the demo. The honest read after twelve months — and the shape of agent EmilyAI already is, not by accident.

_Post 8 of the AI in cyber series._

Agentic AI has been the phrase of 2024 in the cyber AI space, and not just there. Devin's launch in March, the proliferation of LangChain-based agent frameworks, the AutoGPT successors, the demos of AI security analysts doing end-to-end investigation chains — all of this has been the visible surface of a category that the industry has been claiming is about to transform security operations.

It is the end of November. The transformation has not, in any honest read, happened. The demos have improved. The production deployments are still scarce. The cases where an autonomous agent makes consequential cyber decisions without human ratification are, in the real customer base I see, almost non-existent.

This post is an honest accounting of year one of agentic AI in cyber operations. The framing is comparative; the comparison this time is not to EmilyAI versus the new entrants but to the question of what shape of agent the production reality actually takes, regardless of vendor.

What we were promised

The 2024 demo cycle had a recognisable shape. An analyst gives a prompt — investigate this suspicious activity on host X — and an AI agent autonomously plans a sequence of actions, executes them against the customer's environment, reasons about the results, plans more actions, and eventually produces a complete investigation report. The agent uses MCP-style tool integration, browses log servers, runs queries, correlates indicators, and presents conclusions. The video is impressive. The promise is that this will replace the tier-two analyst within a year or two.

What actually shipped

Three things, in roughly the volume they appeared.

Internal automation tooling, framed as agents. A lot of what is now called agentic in security products is the same workflow automation that has existed for years, with an LLM wrapper added on top. The LLM converts natural-language requests into the existing automation's inputs and produces narrated outputs. The agency is mostly cosmetic. The work being done is unchanged.

Augmentation agents for human analysts. Tools that, when an analyst is investigating a case, can be asked to do specific sub-tasks autonomously — check whether this IP has been seen before, correlate these events, summarise this thread. The agent's autonomy is bounded by the analyst's instruction. The human remains in the loop for any consequential action.

Demonstration prototypes that do not ship. A long tail of products that exist on conference stages and in YouTube videos and not, in any meaningful number, in production customer environments. The gap between the demo and the production deployment is real and has been wider than the AI press coverage suggests.

Why the gap is structural

Three reasons the autonomous agent shape is harder to ship than the demo suggests.

Consequential actions need defensible decision-making. When an AI agent decides to isolate a host, block a user account, push a configuration change, or contact a customer, the chain of reasoning that produced the decision becomes evidence. The agent decided is not, in itself, a defensible answer if the decision turns out to be wrong. The audit and reproducibility properties that determinism gives you — and that post 2 of this series describes — become regulatory necessities once the agent has any consequential authority.

The error mode is qualitatively different. A misclassifier produces wrong verdicts; an autonomous agent produces wrong actions. The difference is the speed of harm. A misclassifier wrong in one direction (false negative) is the failure mode security teams already understand. An autonomous agent wrong in one direction (incorrect blocking action) is a different failure mode that propagates through the customer environment in seconds.

The reasoning chain that the demo proudly shows is the same reasoning chain that is non-reproducible. The agent reaches its conclusion via a sequence of LLM calls whose reasoning traces vary across runs. The same investigation, run twice, may produce different actions. The CISO who wants to sign off on agentic deployment is asked to sign off on a system whose behaviour cannot be specified deterministically.

What EmilyAI is, in this framing

A point that is worth being explicit about, because it is one of the framings I have been trying to avoid being too direct about.

EmilyAI is, in the strict technical sense, an agent. She reads from her environment (SIEM events), reasons about them (the funnel), takes actions (writes verdicts, escalates cases, notifies analysts), and learns from feedback (continuous training from analyst dispositions). She is not a chatbot. She is not augmentation. She makes verdicts as her primary output and the verdicts have downstream consequences.

What she is not is autonomous in the LLM-agent sense. Her decision-making is bounded by the canonical schema, the trained model, and the action vocabulary the platform supports. She does not browse the web, run arbitrary queries, or take actions outside the structured set of verdict types. The reasoning is, by construction, deterministic and reproducible. The blast radius of a wrong verdict is bounded — escalate the wrong case to a human, fail to escalate one that should have been escalated. The human-supervised correction loop is short and well-understood.

This is, in my view, the shape of agent that actually ships in 2024. Constrained agency, with a tight action vocabulary, deterministic decision-making, and a feedback loop where humans correct mistakes and the corrections become training data. The demos of unbounded agency are interesting research and probably the shape of something that ships in five years. They are not the shape of anything I would sell to a regulated customer in 2024.

The shape of agent that does ship

A specific characterisation, which I think the field will converge on through 2025.

A bounded action vocabulary. The agent can take actions from a specific list. New actions are added by engineering, not by the model deciding it can take them.

Deterministic decision-making at the consequential layer. Where the agent takes consequential actions, the decision logic is deterministic. The LLM may be in the loop for advisory tasks; the action-taking layer is deterministic.

A human-supervised correction loop. Humans review a sample of the agent's actions. Errors are diagnosed against a reproducible decision trail. Corrections become training data. The agent improves over time without ever taking unsupervised consequential action.

Audit trails that survive scrutiny. Each action is associated with the model version that produced it, the input it acted on, and the structured output that resulted. Why did the agent do X is answerable in detail, by replay if necessary.

Blast-radius engineering. The agent's permissions are scoped by what it can plausibly need. A SOC agent does not have admin access to production systems by default. The principle of least privilege applies as much to AI agents as to human ones.

These properties are, in 2024, more visible in the systems that have been quietly running in production for years than in the systems being demoed. The lesson is not that production-ready agentic AI is hard; the lesson is that the shape it takes when it actually ships is more constrained than the demo, and the constrained shape is durable.

What I think happens through 2025

Three predictions.

**The first wave of agent in production deployments will be in the augmentation layer, not the action layer.** Agents that propose actions for human analysts to approve. The autonomy is bounded by the approval. The value comes from the speed of proposal-generation.

**The verdict-making category — what EmilyAI does — will be reframed as agent-shaped.** The same systems will be in production. The terminology will catch up. Vendors who have been doing it for years will find themselves in the agentic AI category for marketing purposes.

The unbounded-agency demos will continue, in research and on conference stages. They will not ship in regulated production environments at any meaningful scale through 2025. The 2026 conversation may be different.

What is next

In six weeks: the year-in-cyber-AI 2024 retrospective. What turned out to be real, what turned out to be hype, and what 2025 looks like.