Least certain exactly where it has to decide: the Home Office age guesser

A facial age-estimation system whose error margin is widest at the one line it exists to draw is not a decision aid. Setting the immigration politics aside, it fails on accuracy and privacy alone.

Let me set the politics of asylum to one side, because the case against this particular system does not need them. I want to look at the Home Office's plan to use AI facial age estimation on asylum seekers purely as I would look at any system put in front of a board: does it do what it claims, and is the data handling defensible? On both counts, the answer this week is no — and the reasons are the ordinary ones I spend my working life on, not the political ones the headlines reach for.

What is actually being bought

The thing itself is modest in size and large in consequence. The Home Office has signed a three-year contract worth around £322,000 with the German biometrics firm Cognitec, testing facial age estimation through 2026 with a view to using it at the border from 2027. The official framing is careful: the technology is meant to support initial age decisions, a benchmark to give front-line staff more confidence, not a standalone arbiter. This week more than sixty rights groups, Amnesty, Human Rights Watch and Liberty among them, wrote to the government to scrap it, and The Register reported the technology being branded biased and inaccurate. I am less interested in the adjectives than in the numbers underneath them.

The accuracy problem is not that it is bad. It is where it is bad.

Every estimator has an error margin. That is not damning on its own; a system can be useful while being imperfect. What matters is whether the error lands where the decision is hard or where it is easy. Here it lands in precisely the wrong place. The Home Office's own position concedes that these systems are least precise around the 16-to-18-year-old boundary, with even the best performers carrying an error margin of roughly two and a half years in that range. The 16-to-18 line is the only line the system exists to help draw. A tool whose confidence interval is widest exactly at the threshold it is meant to decide is not reducing uncertainty at the point of decision; it is dressing up a coin-flip in a lanyard and a procurement reference.

It gets worse when you ask who the error falls on. Facial age estimation models are trained predominantly on western, white-majority datasets and skew male, and independent evaluation has repeatedly shown accuracy varies by skin tone, sex and geography — NIST's testing finds performance consistently weaker for some demographic groups than others. The population this system will be pointed at is overwhelmingly young, predominantly people of colour, and frequently female. That is the demographic intersection where these models are documented to be least reliable. You could hardly design a worse alignment between a tool's known weakness and its intended use. Layer on the human confounders the rights groups raise — malnutrition, sleep deprivation, the physical toll of a long and violent journey, all of which make a child look older — and the residual accuracy you started with erodes further in exactly the cases that most need to be got right.

The privacy problem is the one nobody has answered

Set accuracy aside for a moment and the data-protection picture is, if anything, weaker, because there is so little of it to examine. We are talking about the systematic processing of biometric data — special category data under UK GDPR — belonging to children, by an outsourced third party, to make decisions that determine whether a child is treated as a child. That is about as high-stakes a processing operation as exists, and it is the textbook trigger for a Data Protection Impact Assessment under Article 35. Yet officials have, as the rights groups point out, not published a Data Protection Impact Assessment or an Equality Impact Assessment, nor the detailed methodologies and results behind the accuracy claims. A DPIA is not a formality you produce after deployment to satisfy an auditor. It is the document in which you are supposed to confront necessity, proportionality, the demographic-bias risk above, retention, and what happens to a child's facial template once the decision is made and who else in government might later decide that template is useful for something it was never collected for. If that assessment exists and stands up, publish it. If it does not exist, you are processing children's biometrics at the border without having done the one piece of work the law requires you to do first.

Why "just a support tool" does not save it

The reassurance that this only supports a human decision is doing more work than it can bear. We know how decision-support systems behave in practice: a number on a screen with the authority of a machine anchors the human reviewer, particularly an overworked one, and particularly when overruling it means taking personal responsibility for being wrong. Call it automation bias or just call it human nature. A "support" tool that is least accurate at the threshold, most biased against the population it is used on, and unaccompanied by a published impact assessment does not become safe because a person signs underneath it. It becomes a way of laundering an unreliable estimate into an official one.

Where I land

I would not deploy this, and my reasons have nothing to do with where anyone stands on immigration. I would not sign it off for a commercial client either, because it fails the two tests I would apply to any system: it is least accurate at the precise point of decision, and the privacy governance that should precede processing children's biometric data has not been done in public. Those are not problems you tune away with a better model next year; they are problems baked into using face-based age estimation for a binary legal threshold at all.

What would change my mind is narrow and specific. Publish the DPIA and the Equality Impact Assessment. Publish the accuracy figures broken down by skin tone, sex and age band rather than as a headline average. Show what the false-classification rate is for a sixteen-year-old girl with darker skin, not for the comfortable middle of the distribution. And state plainly how a contested estimate is escalated to a proper, non-biometric age assessment, and how the biometric template is destroyed afterwards. Do that work and there is a conversation to be had. Until then, this is a system being asked to be most certain exactly where it is least able to be, on the people it is least able to read, with the paperwork that should justify it still unpublished. On accuracy and on privacy, that is not a close call.