A startup founder told a story in April that has been doing the rounds. A Cursor AI coding agent, running under standard developer credentials, deleted his production database and its backups in nine seconds. The agent did exactly what it was authorised to do. It just did it on the basis of an instruction the founder did not realise it would treat as a command.

That is not an AI safety story. It is a credentials story dressed up as one.

Where the mistake actually sat

The agent was given the same broad access a senior engineer would have, on the assumption that it would behave the way a senior engineer would. Senior engineers do not drop production tables because (a) they can be fired and (b) they hesitate. Neither of those properties transfers to a model.

When we hand an agent the credentials of a person, we are implicitly handing it the judgement of a person. The judgement does not come with the credentials. We have to add it back in deliberately — through narrower scopes, through confirmation steps for irreversible actions, through audit trails the agent cannot suppress, through separation between read and write at the credential level rather than at the model's discretion.

The failure mode here is old, not new. The novelty is the speed and the scale. A junior engineer with the wrong credentials can do a lot of damage in a day. An agent can do the same damage in the time it takes to read this paragraph.

The board version of this question

Most UK boards I sit on now have AI tooling somewhere in the engineering function. None of them, in my experience, can answer these in the room:

The honest answer to each of those questions is usually some variation of we would need to come back to you. That answer is itself the finding.

A practical ask

The next board meeting question worth asking is not "are we using AI responsibly?" Boards have been asking that for two years and getting comforting answers that mean nothing.

The question is: show me the list of AI agents running with production credentials, and the matrix of what each one is allowed to do without a human in the loop.

If management can produce that document in under a week, the firm is in better shape than most. If they cannot, that is the work.

The nine seconds is not the problem. The problem is everything that was true about your authorisation model on the day before nine seconds turned out to matter.