LinkedIn 2012, four years late

The LinkedIn breach from June 2012, which the company at the time described as affecting approximately 6.5 million users (LinkedIn 2012 statement, blog.linkedin.com archived), has come back into circulation in a substantially larger form. A seller on the LeakedSource and Real Deal markets has been offering, since the start of this month, an archive containing 167 million records, of which approximately 117 million include usable email-address-and-hashed-password pairs. LinkedIn has acknowledged that the data appears to be authentic and is invalidating affected accounts (LinkedIn corporate blog post, May 18). Troy Hunt's analysis at Have I Been Pwned has the most detailed structural breakdown (troyhunt.com on LinkedIn 2016).

The technical detail that matters operationally is the password storage. The 2012 LinkedIn data was unsalted SHA-1 — a well-known and at the time loudly criticised choice. Around 90% of the new corpus has, on Hunt's analysis, already been cracked. Unsalted SHA-1 against passwords drawn from real users is, on commodity hardware in 2016, a few-hours job for the bulk of the corpus, and the GPU rigs the password-cracking community has been building for years finish the long tail in days. The implication is not that LinkedIn passwords are compromised — they are demonstrably compromised — it is that the credentials are now in circulation as plaintext, and credential-reuse attacks against every other service those email addresses access have been ongoing for some weeks before today's news.

For the SOC and customer-organisation work this morning, the action is straightforward and unwelcome. Cross-reference our customers' user populations against the leaked email-address list. For the matches, force a password reset and an MFA enrolment. Communicate clearly to the affected users that the trigger is a 2012-era external breach, not a current breach of the customer organisation. Watch for credential-stuffing attempts against customer authentication endpoints over the coming weeks. The Sentry-style detection rules that we have for credential-stuffing — rate limits, geolocation anomalies, password-list-match scoring — are going to fire more this month than usual, and the triage volume on those alerts will need analyst attention.

The wider lesson this corpus reinforces is that breach disclosure is structurally underestimating. The 6.5-million figure in 2012 was, on the evidence now available, off by a factor of approximately 25. The reason for the underestimate is not malice — LinkedIn's incident response in 2012 worked on the data they could confirm at the time — but rather that the actual extent of any compromise is hard to determine in the immediate aftermath, and disclosed numbers represent the lower bound. The Anthem disclosure in February 2015, the OPM disclosure in June 2015, and the Yahoo numbers that I expect to land later this year, will all have similar revision patterns over time.

There is a piece of work I have been wanting to do for a while on the secondary market in old breach corpora. The seller of the LinkedIn data is also reportedly offering MySpace (around 360 million records), Tumblr (65 million), and various smaller sets. The pattern is that breaches happen, the data is exfiltrated, the data sits in private circulation among the technically capable for years, and at some point — usually for commercial reasons — it surfaces on a public market. The window between compromise and public surfacing is typically three to seven years. The implication for credential management is that any password used and reused against an internet service before approximately 2014 should be treated as known to adversaries. That is a strong claim. I think it is also accurate.

The vCISO conversation this quarter is going to include the following: every customer organisation should be on a path to enforcing organisation-wide MFA against any service holding sensitive data, with a deadline measured in months not years, and the resistance to that posture from user populations needs to be addressed by communications work rather than by relaxing the policy. The Browne Jacobson programme is approximately halfway through MFA rollout; Towry is most of the way; Northcott is mostly done; the manufacturer is at the start of the programme. The next twelve months on each of those engagements involves making the rollouts operational rather than theoretical.