Computer Use and the operator question

Post 10 of the AI series. Anthropic's Computer Use, OpenAI's Operator, Google's Project Astra. The category where AI literally moves the mouse. What this shape changes for cyber operations — and how it reads against EmilyAI's tighter action vocabulary.

_Post 10 of the AI in cyber series._

In late October 2024, Anthropic shipped Computer Use — a capability where Claude can read a screen, move a mouse, type, click, and operate any GUI-based application. OpenAI followed in January with Operator, and Google announced Project Astra with similar ambitions. The category — AI agents that operate the same software a human would — has been the centrepiece of the early 2025 AI conversation.

For cyber operations specifically, this is a category worth understanding carefully. AI that takes actions on the customer's behalf now means something more concrete than it did six months ago. This post is about what shape that takes, what is genuinely new, and how the comparison to EmilyAI's structured action vocabulary looks.

What the operator category actually does

A specific characterisation, since the marketing tends to abstract.

An operator-style agent receives a goal stated in natural language — book a meeting with Sarah for tomorrow, check whether this user account has been used unusually recently, deploy the standard hardening profile to the new endpoint. The agent then operates whatever software is needed by reading the screen, moving the cursor, typing into fields, and clicking buttons. It does not use APIs. It does not call functions. It uses the GUI exactly as a human would.

The appeal is obvious. Every piece of business software has a GUI. APIs are inconsistent, often poorly documented, and frequently restricted. An agent that can operate the GUI bypasses all of that. The agent does not need to know that the booking tool's API is broken; it can click the same buttons the human would.

The cost is the price of operating at the GUI layer. The agent is slower than a function call. The actions are visible — the cursor moves on a real screen somewhere. Failures are different: a GUI changes, a popup appears, the page renders slightly differently and the agent's coordinate-based action lands somewhere unexpected.

What this changes for cyber operations

Three implications, in roughly the order of how much I think they will matter.

The agent-as-analyst frame becomes more plausible at the augmentation layer. A human analyst's daily workflow involves clicking through multiple consoles — SIEM, EDR, ticketing system, threat intelligence platform, email gateway. An operator-style agent can plausibly do the sequence of clicks the analyst would do, in the same software, faster. For the augmentation tasks — check this indicator across our four consoles and report what you find — the operator pattern is the most natural fit. The early 2025 evidence is that the agent is reliable enough at this shape of task to be useful.

The action vocabulary becomes structurally unbounded. This is the property I want to spend most of this post on. An operator-style agent can, in principle, take any action available through any GUI it has access to. The blast radius is structurally larger than any previous shape of automated agent. The action block this user account and the action delete the entire customer database are both accessible through the same console; the agent's decision-making process is the only thing standing between them.

The audit problem is qualitatively different. A click is, in audit-log terms, identical whether a human or an agent did it. The customer's identity system sees user X performed action Y. The question was this action taken by a human or an agent is not, by default, answerable from the customer's own logs. The implications for incident response and for regulatory reporting are real and being worked through.

The bounded action vocabulary, again

EmilyAI's action vocabulary is, by design, small. A verdict is one of a defined set of structured verdict types. An escalation is to one of a defined set of human analyst roles. A notification is via one of the defined interaction-ring channels. New actions are added to the platform by engineering, not by the model deciding it can take them. This is the constrained agency I described in post 8.

The constraint is sometimes presented as a limitation. The argument for it is structural: when the action vocabulary is small, the worst-case behaviour is bounded. The audit trail is straightforward. The model's failure modes are characterisable. The regulator's question what is the worst your AI can do has a specific answer.

The operator-style agent has, by design, the opposite property. Its action vocabulary is the union of every possible action across every console it can see. The worst-case behaviour is structurally large. The audit trail is the system's normal action log, undifferentiated from human actions. The regulator's question has no clean answer.

For some use cases — exploratory tasks, low-stakes automations, augmentation of skilled humans — the unbounded vocabulary is appropriate. For others — consequential actions in regulated environments — it is not.

A specific worked example

To make this concrete. Consider an enterprise that wants to use an operator-style agent in its SOC to investigate alerts.

The agent receives a goal: investigate this alert and decide whether it warrants escalation. The agent clicks through the SIEM, runs queries, inspects the affected host's EDR data, checks the user's recent activity in identity, and reaches a conclusion. So far this is straightforward augmentation; the agent has not yet taken any consequential action.

Now consider extending the goal: investigate and, if appropriate, isolate the affected host. The agent now has the capability to take a consequential action — host isolation — via the same console clicks. The decision to isolate is the agent's, made on the basis of the LLM's reasoning, in a non-deterministic chain that varies between runs.

The host-isolation example is benign as automation goes (you can un-isolate the host). But the principle generalises. The same architectural pattern lets the agent take any action available in the consoles it can see. Block this user, push this configuration, delete this rule, exfiltrate this data — all are equally accessible to the agent, distinguished only by the LLM's reasoning about whether to do them.

For most enterprise SOCs in 2025, this is the structural concern about operator-style agents that has prevented unsupervised deployment. The same concern, expressed slightly differently: when the action vocabulary is unbounded, the agent's prompt is, in effect, a license to take unbounded action. The CISO who signs off on that license is signing off on something they cannot fully characterise.

What I expect to ship in 2025

Three patterns I expect to see actually deployed through this year.

Operator agents under tight scope. Specific tasks (alert triage research, threat intelligence enrichment, configuration deployment) executed by an operator agent within tightly-scoped permissions. The scope is enforced by the IAM layer, not by the agent's good behaviour. The breadth of capability is limited by what the agent's identity can reach.

Operator agents with human-in-the-loop for consequential actions. The agent investigates and proposes the consequential action. A human ratifies before execution. The latency is acceptable for most SOC work; the audit trail records who ratified.

Constrained-agency agents (the EmilyAI shape) continuing to do the verdict-making work. A structured action vocabulary, deterministic inference, continuous learning from feedback. The operator-style agent augments the human analysts who supervise this layer. The two layers do different work.

The combination — constrained-agency agent for the verdict tier, operator-style agent for the augmentation tier, human-in-the-loop for any cross-tier consequential action — is the shape I think most regulated SOCs converge on through 2025.

What is next

In six weeks: continuous learning at scale. The operational properties of running a model that updates with feedback over years, and what the frozen LLM artefact shape gets structurally wrong.