LinkedIn · Peter Bassill

The LinkedIn breach a week ago — 6.5 million unsalted SHA-1 password hashes posted to a Russian forum on the fifth of June, of which more than half were cracked within forty-eight hours — is the depressingly familiar 2012 example of the password-storage problem the security community has been writing about for fifteen years. The technical detail is uninteresting: SHA-1 hashes, no salt, somebody got copies of the user table, the hashes are now public, password reuse means that any LinkedIn user whose password matched their email password (which is between forty and sixty per cent of users on every survey I have seen) is now in a worse position than they realise. LinkedIn has confirmed the breach, forced password resets, and announced they will be implementing salted hashing for new passwords going forward. Last.fm and eHarmony disclosed similar breaches the same week, with similar shapes and similar root causes.

The thing I want to write down is not the breach itself but what it tells us about the operational maturity of the public-internet platform sector seven years after the first wave of substantial public-internet password-database leaks. We have known how to store passwords properly since at least the 1980s. The various industry best-practice writeups have been clear for years that the right shape is bcrypt, scrypt, or PBKDF2 with substantial work factors, with per-password salt, and ideally with a peppering scheme on top. LinkedIn, in 2012, was using unsalted SHA-1. That is not a 2012 mistake; that is a 1990s mistake which has been allowed to continue into 2012 because the cost of fixing it has, in LinkedIn's prior risk register, been judged too high relative to the cost of leaving it. Brian Krebs's coverage over the past week is the right primary source for the operational chronology and for the various third-party-cracker analyses of the dump.

The operational lesson is that "we hash passwords" is not, on its own, a credible security claim. It needs to be "we hash passwords with bcrypt/scrypt/PBKDF2, per-password salt, with work factors that have been reviewed in the past eighteen months". I have been adding a question to the Hedgehog engagement scope for some months — "what is your password-storage scheme" — and the answers have been, on average, worse than I expected. About a third of the engagements I have run in 2011 and 2012 have had unsalted hashing somewhere in scope. Several have had cleartext password storage in legacy systems that have not been migrated. The LinkedIn incident is the public reference I will be using to make the case for fixing this; the cost of fixing is roughly two engineer-weeks per affected system and the cost of not fixing is the LinkedIn-shaped public spectacle when the inevitable breach lands. Jeff Atwood's "Speed Hashing" post from April is, by some distance, the most accessible explanation of why work factors matter, and it is the one I have been pointing developers at when the conversation is about whether the upgrade is justified.

The wider point about password reuse is a structural problem that platform operators cannot solve on their own. LinkedIn cannot prevent a user from using the same password on Gmail, on banking, on anywhere else. The structural answer is two-factor authentication everywhere it matters, but two-factor deployment in the consumer space is still essentially a luxury feature five years after Google rolled it out for Gmail. The LinkedIn-aftermath conversations I have been having with clients have been "how do we encourage our users to use unique passwords" rather than "how do we provide structurally better authentication", which is a statement about the maturity of the sector rather than about any individual client.

For the engagements with sensitive customer data — News International, Browne Jacobson, the Hedgehog clients with public-facing services — I have been pushing harder on the post-breach conversation about credential reuse. The LinkedIn dump means that for any user who registered an account on a client site using their LinkedIn email and a similar password, an attacker who downloaded the dump can now try those credentials against the client's authentication. The credential-stuffing attack is operationally easy and technically straightforward; the defensive answer is monitoring for credential-stuffing patterns at the authentication layer, which most clients are not doing. I have been adding it to the Hedgehog SOC detection brief.

The salting-versus-not-salting debate is the part of this story that has produced the most heat and the least light over the past week. The technical position is straightforward: per-password salts prevent rainbow-table attacks but do not prevent the attacker who has the database and is willing to run a per-hash dictionary attack. The salting question is therefore "will the next attacker who steals our database be running a rainbow-table attack or a per-hash dictionary attack". The honest answer for a database the size of LinkedIn's is that any sufficiently-motivated attacker will run a per-hash dictionary attack regardless of salting, because the hash count is large enough to make per-hash work worthwhile. The salting decision is therefore not the binary defence the popular discussion treats it as; the work-factor decision (using bcrypt with a high work factor, or PBKDF2 with high iteration count) is what actually buys time for password reset. LinkedIn was missing both salt and work factor; the fix needs to address both.

The next post is probably the Hedgehog SOC build update — we have the office, the first analyst has started, the Splunk infrastructure is being deployed this week — or the rumoured Saudi Aramco incident that several of my correspondents are hearing about and which is allegedly substantial. Or possibly Flame's continued unfolding, which Kaspersky has been publishing additional analyses on through the past week.