WannaCry, ten days on: NHS, BSI, the structural picture

Ten days after the Friday outbreak, the picture has firmed up enough to write properly. The NHS specifically. WannaCry as an event. And what comes next.

The NHS impact, on the figures that have stabilised over the past week (NHS Digital incident response report, and the National Audit Office preliminary briefing for the Public Accounts Committee), is severe but bounded. Approximately 81 NHS trusts and 8 Scottish health boards were affected. Around 19,000 patient appointments are estimated to have been cancelled. Five A&E departments diverted patients to other hospitals. No patient deaths have been directly attributed, although the operational disruption to elective surgery and diagnostic services will have produced harm that is harder to attribute. The financial cost to the NHS is being assessed by the NAO and is unlikely to be fully reported for some months; the early estimates run from £20 million in direct response and recovery cost to £100 million-plus when productivity loss and remediation are included.

The structural causes are well-understood at this point. The NHS Windows estate is large, mixed, and includes a substantial population of out-of-support Windows XP and Server 2003 hosts whose presence is a function of medical-device certification cycles that lock device-control PCs to specific Windows versions, of procurement decisions made over the past decade-plus, and of patching cadences that have not kept pace with the threat landscape. The Cyber Essentials guidance from CESG (now the National Cyber Security Centre) (NCSC Cyber Essentials scheme) and the Department of Health's Caldicott review processes have been pushing in the right direction for several years, but the implementation has been uneven and the resourcing has been inadequate against the scale of the estate. None of this is news to anybody operating in the UK public-sector security community. WannaCry has made it news to everybody else.

The structural causes are reproducible. The NHS is not unique in having a Windows long-tail problem. Many UK local authorities, several large utilities, parts of the rail and telecoms infrastructure, and a substantial fraction of UK manufacturing have similar exposure. The healthcare sector's specifics around medical-device locking are particular, but the broader pattern of legacy Windows in operational use against insufficient patching is general. The next worm-grade event, when it happens — and it will — will hit a different sector with the same structural cause. The defensive work, therefore, is sector-by-sector but the lessons are sector-general.

The political response in the UK is going to drive substantial change. The Cabinet Office and the National Cyber Security Centre have been more visible than usual in the last ten days; the consultation that NCSC has launched on essential-services cyber resilience (NIS Directive transposition consultation) is going to feed into a UK NIS Directive implementation that is now likely to be more demanding than I would have predicted in March. The NIS Directive itself, the European Network and Information Security Directive (Directive (EU) 2016/1148, eur-lex), is on a transposition deadline of May 2018; in the UK that transposition will, post-WannaCry, attract more political weight and produce stricter Operator-of-Essential-Services obligations than the original consultation contemplated. That is good. The cost to OES-classified organisations is going to be substantial; the alternative — another WannaCry-class event with another set of structural causes that everyone could see in advance — is worse.

For the customer briefings, the post-WannaCry conversation has shifted in three directions. First, the patching cadence question is now formal. Customer organisations are agreeing to specific Service Level Agreement-style commitments on patching latency for critical updates, where previously the conversation has been around best-effort cadences. Browne Jacobson agreed a 7-day SLA for critical patches at last week's board cycle; Towry has been at 5 days for trading-platform-side patches for some time and has now extended that to the wider estate. Northcott is at 14 days, with a target of 7 by year-end. The manufacturer is at 21 days and the conversation about reducing that is ongoing — the OT-adjacent estate makes 7 days operationally hard.

Second, the long-tail-Windows conversation. The customers with substantial XP or Server 2003 footprint are now moving on replacement programmes that had been in slow planning for a year or more. The cost is substantial; the political environment for approving the cost is, post-WannaCry, much more favourable than it was a month ago. We are advising fast moves on the most exposed segments while the budget conversation is winnable, with longer programmes for the harder-to-replace segments. The NHS is doing the same thing at scale.

Third, the recovery posture. Several customers asked, in the days after the 12th, whether their recovery procedures would have worked against an attack of this nature. The answer for all of them was "probably yes, with assumptions" and the work this month is to test the assumptions. Backup-restore exercises, tabletop incident-response drills, and specifically the question of whether backups themselves are protected against ransomware-class encryption have all moved up the priority list. The Browne Jacobson backup posture turned out to have a gap — backup repositories accessible from the live network — that we found in the tabletop exercise on Monday and are remediating this week. That is the kind of finding that, in retrospect, should not have surprised us, but did.

The Marcus Hutchins kill-switch story has, separately, taken an unexpected turn that I will note for the record. Hutchins is being lionised in the press in a way that has, on his own account, become uncomfortable for him; the operational community knows the story is more nuanced than the press version. The wider point — that a single researcher's decision averted a substantial fraction of the harm a worm-grade event would otherwise have produced — is true and is worth the recognition. The structural point — that the existence of the kill-switch was a consequence of the worm authors' choices and not of any defensive design — is the one that matters for planning the next event. There may not be a kill-switch in the next worm.

I will be at the Infosec Europe panel on the 6th of June discussing this; the Hedgehog write-up of the customer-side response will be published on the company site in the next week. The personal blog continues; the NHS structural conversation continues for some time.