Petraeus and the metadata problem

The thing about the Petraeus-Broadwell investigation is not the affair itself. The thing is that two careful people, both of whom should have understood the operational-security threat model — a four-star general turned director of the CIA, and a counter-terrorism instructor at West Point — used a "shared-draft-folder" scheme that should have been operationally adequate, and were nonetheless identified, located, and exposed within months by an FBI investigation that did not need to access the content of any of their messages. The exposure mechanism was metadata. The mechanism was not novel. It was, by the standards of any post-Patriot-Act privacy-and-encryption discussion, completely predictable. And yet two demonstrably careful people had not internalised it.

I have been thinking about this case in the context of the privacy-and-encryption methodology I have been drafting for the engagement team for the past two months, because the Petraeus case is the cleanest public illustration of the problem the methodology is trying to address. The methodology has been failing to land at the client level despite being technically straightforward, and the case I have been making — that encryption matters, that metadata matters, that the threat model has to include sustained surveillance by motivated parties — has been hard to pitch with concrete examples that boards understand. Petraeus is the example.

The shared-draft-folder scheme that Petraeus and Broadwell used is operationally inadequate for the same reason that most "I'll just use Gmail with extra steps" approaches are inadequate. The communications endpoint is a Google account; the account login is associated with an IP address, a session cookie, a User-Agent string, a recovery telephone number, and a billing address; the metadata around each access produces a graph that an investigator with subpoena authority can resolve to the individual human. Each of those signals is, individually, recoverable from Google's logs through standard legal process. The aggregate is the identification.

The technical answer for the case where two parties want to communicate without leaving an investigator-recoverable metadata trail involves several steps that Petraeus and Broadwell did not take. End-to-end encryption of the message content is a baseline; the OpenPGP infrastructure has been standardised since 1998 and is deployable today, though the practitioner-grade key management is poor enough that I will not pretend it is easy. Anonymising the connection metadata is the harder part — the Tor network is the canonical answer, with all the operational caveats about exit-node integrity and the ongoing question of whether traffic-correlation attacks against Tor work in practice for adversaries with global passive monitoring (which the FBI does not currently have, but which other adversaries arguably do). Compartmentalising identities is a third step — the email account used for sensitive communications must not be linkable to the named-identity online presence, which means the account creation, the recovery information, the phone number associated, and every login must use a different identity-and-network configuration than the public persona. Petraeus did not do any of this. Broadwell did not do any of this. They were both exposed within months.

For the client engagements where this methodology matters — News International on the source-protection question, Browne Jacobson on privileged-client-communications, several Hedgehog clients with whistleblower or sensitive-source workflows — I have been adapting the structural argument from "use encryption" to a sharper articulation: the threat model has to include the metadata, not just the content. The technical recommendations I have been drafting are: end-to-end content encryption (OpenPGP for email, OTR for chat where applicable, TLS-with-perfect-forward-secrecy for transport); compartmentalisation of identities at the platform level (separate email accounts, separate phones where the threat model justifies it, separate browser profiles); use of Tor or similar anonymising network for the highest-risk communications, with a documented threat-model justification for when this is necessary and when the operational cost is not justified; and explicit attention to the metadata that platform operators retain and that legal process can compel them to surrender. Bruce Schneier's piece on the Petraeus case last week is the right summary of the technical argument; I have been pointing clients at it.

The TLS-deployment piece of this is, in 2012, simpler than it has been historically and more pressing than it has been historically. Eight per cent of the Alexa top-million sites now serve HTTPS by default, which is up from approximately three per cent in 2010 and is a continuation of the post-Operation Tunisia trajectory. The HSTS draft RFC is in late stages of standardisation and the major browsers have shipped support; the structural problem of "user types login.example.com, gets HTTP, attacker man-in-the-middles before the redirect to HTTPS" is on the verge of having a deployable answer. Perfect forward secrecy via ECDHE cipher suites is now widely supported in modern TLS implementations and is becoming standard at the major platforms. The argument I have been making to clients is that "we use HTTPS" is not in 2012 a sufficient claim; the claim should be "we use HTTPS with HSTS and forward secrecy", and the case for that articulation has been reinforced by the Comodo and DigiNotar precedents.

The OpenPGP problem is more difficult and I have been less productive in the engagement-team material on this front. PGP is technically capable of doing what the threat model requires, but the practitioner-grade key-management infrastructure is awful. Key generation, key publication, key signing, key revocation, key rotation — each of these has tooling that is hostile to non-expert users, error-prone, and operationally fragile. The Enigmail extension for Thunderbird is the closest thing to a deployable answer for non-developer users, and it is not a great deployable answer. The GPGTools project on macOS has been improving but is not at the maturity I would want. The honest answer for most of my clients is that PGP is the right answer for senior-staff-with-source-protection-concerns and is too operationally costly for general use. There is a role for a tool with PGP's security properties and substantially better usability, and I have not seen one yet.

For the SOC build, the privacy-and-encryption methodology has been informing what we monitor for as well as what we recommend clients deploy. The detection content now explicitly addresses unencrypted credentials in network traffic, certificate-chain anomalies that suggest TLS interception, and unusual metadata patterns in email flows that may indicate either compromise or unauthorised data exfiltration. The detection patterns are not novel; the framing as "metadata-aware monitoring" is sharper than my previous framing, and is changing what I direct the analysts to look for.

The next post is likely the year-end retrospective. There are a few weeks of 2012 left and I will be surprised if nothing further breaks before December, but the broader shape of the year is now visible enough that the writing-up can begin.