Collection #1 · Peter Bassill

Troy Hunt published the analysis of Collection #1 this morning (troyhunt.com on Collection #1). The aggregation contains 772,904,991 unique email addresses and 21,222,975 unique passwords, drawn from approximately 2,000 individual breach corpora that were combined into a single archive sold and distributed in the secondary market over the past several months. The archive surfaced publicly through a hacker-forum listing in mid-December and has since been propagating across the various distribution sites that handle this kind of material.

The numbers are large but the operational implications are continuous with the trend that has been visible since the LinkedIn 2012-resurfacing in May 2016. The aggregate population of exposed credentials is now substantial enough that the assumption that any password used and reused before approximately 2018 is known to adversaries is no longer a strong claim — it is a default assumption. The credential-stuffing risk against any service that authenticates by password against an email address has been operationally elevated for several years; the Collection #1 corpus does not change that fact, it adds to the data the credential-stuffing operators are working with.

For the customer briefings, the Collection #1 disclosure is the catalysing event for accelerating MFA rollout schedules that have been moving steadily but not urgently. Browne Jacobson is on TOTP for administrative populations and SMS for the wider firm; the 2019 plan is to migrate the SMS population to TOTP, with a target of complete by Q3. Towry is mostly on TOTP. Northcott uses Duo Mobile push and is in good shape. The manufacturer's seven-thousand-user MFA migration off SMS continues; the schedule is currently on track for Q4 completion. The financial-services firm is on hardware tokens for the privileged population and TOTP for the wider population. The retailer is on TOTP migration that is on schedule.

For the SOC operation, the credential-stuffing detection rules are being tuned for the post-Collection #1 environment. The cross-reference workflow against the Have I Been Pwned API has been a routine practice for several years, and the integration of the new corpus into the public Have I Been Pwned database happened over the weekend — Hunt's team handle the ingestion and the API exposes the results. Customer-organisation user populations whose corporate email addresses appear in the new corpus will be identified through the standard cross-reference cycle this week and forced through password reset and MFA enrolment as needed. The aggregate operational cost of the response is moderate and within the existing capacity of the SOC.

The wider strategic point — repeated from previous years' writing because it remains the salient observation — is that password-only authentication for any service holding sensitive data is no longer defensible in 2019. The MFA-everywhere posture has been the right posture for several years; the implementation has been the bottleneck. The implementation continues. The customer-organisation conversations have, at this point, settled into operational tempo rather than strategic resistance — the question is no longer "should we do MFA" but "what is the rollout path that minimises user-experience disruption". The progress has been steady; the Collection #1 disclosure is one more datapoint along the curve.

The blog will return to the credential-aggregation question with sufficient frequency that I am going to start treating these disclosures as a category to be written up in batches. There will, by the end of 2019, be more such corpora. The pattern is reliable enough to plan for.