Year in cyber AI 2024: what was real, what was not

Post 9 of the AI series. The 2024 retrospective. Six security copilots shipped; one major outage reshaped the resilience conversation; reasoning models arrived; agents mostly did not. The honest read going into 2025.

_Post 9 of the AI in cyber series._

Twelve months and eight posts into this series, this is the retrospective. What turned out to be real in 2024, what turned out to be hype, and what 2025 looks like from where I am sitting at the start of January.

I have tried to be specific. The fashion in cyber-AI commentary is to lean toward either everything is about to change or nothing has really changed. Both are wrong. Several things changed in 2024. Several others changed less than was promised. The honest position requires distinguishing between them.

Five things that were real

The copilot category became standard. Every major security platform now has an LLM-based assistant. Microsoft Security Copilot reached general availability in April, CrowdStrike Charlotte and SentinelOne Purple matured through the summer, and the smaller vendors have followed. The category will commoditise over the next two years. By the end of 2025 I expect which copilot is included to be a procurement footnote rather than a differentiator. Post 3 describes the category as it stood at mid-year; the assessment holds.

Open-source LLMs caught up in the middle. Llama 3, Mistral, Mixtral and several smaller releases meant that, by mid-2024, an on-premises LLM deployment was a credible alternative to the hosted incumbents for most enterprise tasks. For UK firms with data-residency requirements the practical implications are large; post 5 covers them. The Llama 4 release is expected in Q1; the gap will narrow further.

Reasoning models arrived as a distinct model class. OpenAI's o1 in September, Anthropic's extended-thinking variants, Google's reasoning models. A genuine step-change for the categories of work that benefit from explicit chain-of-thought — complex correlation, hypothesis generation, incident reconstruction. Post 7 covers the shape; the assessment that reasoning is the right tool for the augmentation layer and the wrong tool for high-volume triage holds.

The CrowdStrike outage reframed the resilience conversation. Not an AI incident, but a content-update incident with the same blast-radius shape an AI update would have. The implications for AI-in-security are covered in post 6. The industry conversation has been measured and so has the regulatory response; both will continue through 2025.

Determinism became a procurement question. The questions from post 2 — same input, same output, version-pinned auditing — were, twelve months ago, my own peculiar preoccupation. They are now appearing in serious enterprise procurement RFPs. The shift is welcome and was overdue.

Five things that were less real than the press suggested

Autonomous AI agents in production. The agent demos through 2024 were impressive. The production deployments of unbounded autonomous agents making consequential cyber decisions were, as far as I can find, essentially zero. Post 8 covers why the gap is structural. The shape of agent that actually ships is constrained, deterministic at the action layer, and looks more like what EmilyAI has been since 2018 than what the demos suggest.

AI-driven cyber offence at scale. Press coverage through 2024 included regular suggestions that AI was meaningfully changing the offensive cyber landscape. The evidence I see suggests this is partly true and largely overstated. Phishing has got better grammar; the click-through rate has not materially shifted. Voice clones have been used in targeted fraud; the volume is small. Code generation has helped script-kiddie-tier capability climb a rung; the rung above is unchanged. The dramatic claims about AI fundamentally changing the offensive landscape have not landed in 2024.

The CISO replaced by AI. A genre piece that recurred through the year. The CISO has not been replaced. The CISO's tooling has improved. The CISO's audit-and-accountability obligations have, if anything, increased — post 6 of the privacy series covers the SEC Brown charges. Replacing the CISO is not on any 2025 roadmap I have seen.

The end of the human analyst. Closely related. Human analyst headcount in UK security operations has not measurably dropped through 2024. AI augmentation has changed what the human analyst spends time on (less triage, more investigation and supervision) rather than reducing how many of them there need to be. The economics may shift in some sectors over the next two to three years; they have not shifted yet.

AI-driven adversary emulation displacing red teams. The AI red team category was loud through 2024. The production deployments are scarce. Real red teams continue to do real work. AI-assisted offensive tooling is increasingly part of the red teamer's kit; it has not displaced the team. The CREST methodology work I described in mid-2025 reflection covers the trajectory.

Where EmilyAI stands at the start of 2025

A short reflection.

The architectural decisions from 2018 — hexagonal pattern, deterministic INT8 inference, continuous learning from analyst feedback, structural privacy, single-tin on-prem option — have all held up through the LLM era. The system continues to do the tier-two triage work it was built for, on the same underlying architecture, with the same operational properties. The 2024 year added a few specific things: improved CPU pre-triage performance from the Intel AMX work, a richer cross-tenant intelligence model in schema v1.1, and the interaction ring (Slack, Teams, voice, SMS) added in schema v1.2.

What is changing in 2025: a closer integration with augmentation reasoning models for the human analysts who supervise EmilyAI's escalations. The reasoning models are a useful tool for the human side; they are not the right tool for the analyst-replacement side. The integration is straightforward because the platform was built to allow the augmentation layer to be replaced over time without touching the analyst.

The model registry now contains EmilyAI versions back to 2019. Re-running any historical event from any of our customers through the model version that handled it originally produces, to this day, the same verdict as before. This is the property post 2 describes; it has paid back in compliance work several times this year.

What 2025 looks like

Three short predictions, with the usual humility.

The constrained-agent category gets named and ships. The shape of bounded-autonomy agent that post 8 describes will be the dominant production deployment pattern through 2025. The vendors who have been doing it for years will be reframed as agentic. New entrants will discover that the constrained shape is what regulated customers will buy.

Reasoning model cost drops by a factor of ten. Inference optimisations, open-source competitors, and competitive pressure on the hosted incumbents will push reasoning-model cost down sharply. By the end of 2025, reasoning quality will be the default rather than the premium.

Cyber regulation explicitly addresses AI for the first time in the UK. The Cyber Security and Resilience Bill is widely expected to include AI-specific provisions. The shape is unclear; the inclusion is now politically inevitable.

What I will be writing about in the first half of 2025

A short forward-look. Three pieces I have drafted material for.

In February, the Anthropic Computer Use feature and what the operator question means for security operations.

In April, continuous learning at scale — the operational properties of running a model that updates with feedback over years, and the things the LLM-as-frozen-artefact shape gets wrong.

In May, the cross-tenant intelligence question — the privacy architecture problem nobody talks about, and what EmilyAI's schema-versioning approach has taught me about getting it right.

What is next

In six weeks: Anthropic Computer Use, the operator question, and what we mean when we say AI takes actions on the customer's behalf.