Determinism and regulatory defensibility, eighteen months later

Post 14 of the AI series. The bit-identical-inference property I wrote about in 2024 is showing up in regulatory drafting. What the Cyber Security and Resilience Bill drafting work suggests about how regulators are going to evaluate AI-driven security decisions.

_Post 14 of the AI in cyber series._

When I wrote post 2 of this series on deterministic inference, the determinism property was a niche concern. Eighteen months later, the property has moved from security architect's preference to what the regulator is going to ask about. This post is about that shift, what is driving it, and what the Cyber Security and Resilience Bill drafting work suggests is coming.

I am writing this with two caveats. First, the Bill's secondary legislation is not yet published and the specific text will change. Second, the conversation I am summarising is happening in committee work I have been close to but cannot quote directly. The shape is, I think, clear enough to be worth describing.

What the regulator wants to be able to do

The thing that has crystallised in the past year, in conversations with both ICO staff and sectoral regulators, is what regulatory defensibility of an AI-driven security decision actually requires.

Three properties, in order of how often they come up.

Reproducibility. The regulator's question is: can you show me, today, what your AI decided about this incident six months ago, and can you re-run that decision now to confirm it produces the same output? The honest answer for most current AI security tooling is no. The same prompt, the same data, the same vendor — does not produce the same output. The audit trail typically records the AI flagged this case but does not record the model artefact, the input, and the output in a way that can be re-executed.

Auditable model lineage. Which model version, exactly, produced this verdict? When was that model trained, on what data, with what feedback loop? When was it deployed, on what schedule, with what rollback options? The regulator wants to be able to follow the chain from a specific verdict back through the model's history. The hosted-LLM products that update centrally on the vendor's schedule do not give the customer this lineage in usable form.

Demonstrable human authority. Who, specifically, is accountable for the AI-driven decision? Was a human in the loop, and if so, at what stage? Can you show me what they saw, what they decided, and on what basis? This is the property that the constrained-agency shape supports well and the unbounded-operator shape supports poorly.

These three properties, in combination, are what the regulator means when they say we want to be able to evaluate the AI's role in this incident. The Bill's drafting work has been converging on language that requires all three for AI-driven security decisions in regulated entities.

Why the determinism property is load-bearing

A specific reason determinism matters here that I underemphasised in the original post.

Without deterministic inference, none of the three regulatory properties is fully achievable. Which model produced this verdict is meaningless if re-running the same model on the same input would produce a different verdict. Who is accountable is awkward if the basis for accountability — the AI's reasoning — was not reproducible at the moment a decision was made. Reproducibility is the property determinism makes possible.

This is why I have come round to thinking that determinism is not, as I originally framed it, one of several properties worth caring about. It is the load-bearing property. Audit, accountability, and reproducibility all rest on it. The vendors who have not engineered for it will struggle to retrofit. The vendors who have, even in a quiet way, are well-positioned for the regulatory environment the Bill is bringing in.

What the Bill's drafting work suggests

To the extent it is public. The Cyber Security and Resilience Bill, in its current secondary-legislation drafting, addresses AI-driven security decisions in a few specific places.

The Bill's incident-reporting provisions — the 24-hour and 72-hour clocks I described in the board read of the Bill — apply to incidents involving AI-driven decisions in the same way as they apply to any other incident. The 72-hour report has to include what the firm's AI tooling did, when, and on what basis. This is the property that is hardest to deliver from current LLM-based products.

The Bill's supply-chain provisions — the right of the regulator to demand evidence of diligence on critical suppliers — extend to AI security vendors. Which AI products are doing security work in our environment, on what basis, with what audit guarantees is the diligence the regulator can demand the customer demonstrates.

The Bill's enforcement provisions — the penalties up to £17m or 4% of global turnover — apply to incidents where the AI tooling was material to the failure. The firm's defence will, in those cases, include the firm's ability to demonstrate what the AI did and why. We do not know what our AI did is not a defence the regulator will accept.

The Bill does not, on its current drafting, require deterministic AI inference. It does, however, set expectations about audit, reproducibility, and accountability that are easier to meet with deterministic inference than without.

What this means for procurement now

Three practical questions for any AI security procurement in the second half of 2025.

Does the vendor's audit log allow the question to be answered specifically? Which model version handled this event, what was its input, what was its output, when was that model deployed. If the answer is yes to all four, the vendor is in a good place. If the answer is some yes, some no, the gap is the gap the customer's regulator may eventually ask about.

Can the vendor demonstrate the human-authority chain for any AI-driven decision in their product? A human took this consequential action, on this AI's recommendation, with these tools available to them, on this timestamp. The shape of the answer matters more than the specifics; the shape should be yes, here is the chain rather than yes, in principle.

Has the vendor engineered for the determinism property? If yes, what is the operational evidence — same input today, same output as six months ago, on the same model version? If no, what is the vendor's plan for the regulatory environment that is now visible on the horizon?

These questions sound technical. They are, in the procurement-and-compliance shape they will take through 2026, becoming standard.

The wider point

I want to draw out a point that has emerged through this series and that this post crystallises.

The interesting architectural decisions about AI in cyber security are not, in 2025, primarily about the AI itself. They are about the operational properties around the AI: the audit trail, the model lineage, the human authority, the cross-tenant intelligence governance, the regulatory defensibility. These properties have to be designed in, not retrofitted. The vendors who designed for them years ago, when nobody was asking, are now positioned where the questions are arriving.

The wider AI conversation has spent eighteen months focused on whose model is best. The cyber security application has spent the same eighteen months learning that whose operational discipline is best is the more durable question.

What is next

In six weeks: a piece on the single-tin posture. The Dell PowerEdge R760 that I have referred to throughout this series, and why we still ship on a single piece of hardware racked at the customer in 2025. The economic, regulatory, and architectural argument for one machine, on-prem, all the AI work happens here as a deployment shape worth defending against the hyperscaler default.