23andMe, and the data with the longest half-life

Last month 23andMe disclosed that attackers used credential stuffing against accounts opted in to relative-matching to scrape data on roughly 6.9 million people. The board lesson is about which data has the longest half-life — and it is not what most firms think.

On 6 October, 23andMe disclosed in an SEC filing that a threat actor was able to access the accounts of a subset of customers. By the end of the month the actor was publicly offering profiles of around 6.9 million people for sale. The company confirmed the scale and clarified that the access came not from a system compromise but from credential stuffing — using usernames and passwords from previous breaches at unrelated services and trying them on 23andMe accounts. Of the customers whose accounts were directly accessed, the data of millions of others was reached through the opt-in DNA Relatives feature, which lets users see other users they are genetically related to.

The board lesson here is not really about 23andMe. It is about which categories of data have the longest half-life, and which firms underestimate that half-life when they decide how seriously to protect it.

Genetic data is for life — and for your descendants

Most data breaches involve information whose value to an attacker decays. A stolen credit card number is useful for weeks before it is reissued. A stolen password is useful until the user changes it. A stolen address is useful as long as the person lives there.

Genetic data does not decay. The genome 23andMe stored on each of those customers is the same genome those customers will have for the rest of their lives. It is also the same genome — to a meaningful extent — that their children and grandchildren will have. The data that was leaked last month will still be relevant to those families in 2080.

This matters because the correct level of protection for data depends on how long it will be exploitable. A firm storing data that loses its value in months should protect it accordingly. A firm storing data that retains its value for a century should protect it accordingly. The current state of the practice does not really differentiate. Genetic testing firms, identity-document scanning services, biometric verification platforms, and DNA-based health services hold data whose half-life is measured in decades. Most are protected as though it were measured in years.

The 23andMe disclosure also includes a darker note. Some of the data appears to have been specifically filtered and offered for sale by ethnicity — Ashkenazi Jewish users in one tranche, ethnic Chinese users in another. The leaked information includes display names, sex, year of birth, partial postcodes, and in some cases genetic relatedness percentages. The use cases for that data, in the wrong hands, are not difficult to imagine.

The credential-stuffing context

The fact that the access came from credential stuffing rather than a direct system compromise does not absolve 23andMe — but it is structurally important. Credential stuffing works because users reuse passwords across services. The mitigations are well-understood: enforce MFA, monitor for stuffing-shaped login patterns, lock accounts after repeated failures, alert users when their email shows up in a known breach. Have I Been Pwned maintains the canonical reference dataset for credentials from prior breaches. Any large consumer service should be checking new and existing passwords against it.

23andMe did not require MFA before the incident. They are introducing it now, as required, and have introduced password reset for affected accounts. The pattern is recognisable: a service handling extremely sensitive data, applying the security model appropriate to a less-sensitive consumer service. The mismatch is the structural failure. The credential-stuffing was the proximate cause.

The opt-in feature design problem

The 6.9 million number is dramatic because of DNA Relatives — the opt-in feature that connects users to genetically related accounts. The attacker only directly compromised around 14,000 accounts. The other 6.9 million were reached through those accounts' relative-matching graphs.

This is a design problem more than a security problem. A consent model that lets one user see information about millions of others — even at a high level — creates an attack surface that is not the user's own account. The user who opted in to DNA Relatives consented to their data being visible to relatives. They did not necessarily consent to that data being visible to whoever could compromise any single one of their relatives.

This is the broader problem with consent in graph-shaped data. The user makes a decision about their data, but the decision implicates the data of every connected node. Genetic data is an unusually clean example because the graph is biological. The same shape problem exists in social graphs, financial graphs, communications graphs, and increasingly in machine-learning training corpora.

The product design lesson — for any firm building services on graph data — is to think carefully about whether the consent obtained from each user is sufficient to cover the downstream visibility of connected users' data. The answer is often no, and the answer matters more once the underlying data has a long half-life.

What boards should ask

Three questions, for any firm holding long-half-life data on customers.

What is the half-life of the most sensitive data we hold? Is our protection calibrated to that half-life rather than to the typical short-half-life consumer model? For most firms, the answer to the second question is no, even when the first answer is decades.

Have we enabled and required MFA on all customer accounts, particularly for accounts with elevated access to other users' data? If the answer is we offer it but do not require it, the next ICO investigation in this category will set the precedent for whether that is enough.

If our user-facing consent model permits sharing or visibility to other users, have we modelled the cascade effect of a single compromised account? Most firms have not. Most should.

The regulatory tail

The ICO has opened an inquiry into the 23andMe incident, in conjunction with the Office of the Privacy Commissioner of Canada. The likely outcome, over the next twelve to twenty-four months, is enforcement action whose specifics will set new expectations for genetic and biometric data handlers. Firms in that category should be reading the next ICO publication carefully.

For everyone else, the 23andMe story is a reminder that which data is sensitive is not a fixed answer. Sometimes it is the data nobody thought to put on the high-priority list. Often, the half-life is the giveaway.