Between August and October 2025, His Majesty’s Revenue and Customs suspended Child Benefit payments to roughly 23,800 UK families using Home Office travel data the agency knew, before deployment, carried a 46 per cent pilot error rate. Of those families, 71 per cent were subsequently confirmed as legitimate claimants whose payments should never have stopped (HMRC, 2026; Shabad, 2026). The error rate was documented. The pass and fail thresholds were not. The suspensions proceeded anyway. For Australian readers, the case is uncomfortably familiar because it follows the same sequence as Robodebt: a known-flawed input, no calibrated tolerance for harm, and an accountability structure diffuse enough that no one decision-maker was clearly answerable for the outcome.
On paper, the UK control logic is reasonable. Welfare programmes do have residency rules. Border travel-record cross-checks are an inexpensive way to flag potential non-residence. The defensible version of the policy would have triaged matches against the known error rate, sampled before suspending at scale, and built in a fast pathway for legitimate claimants to clear themselves before any payment was withheld. The deployed version did none of those things. The written evidence to the House of Commons Public Accounts Committee describes the rollout as having proceeded despite internal risk advice flagging the error rate, with no documented decision rule for when the data could and could not be relied upon, and with the burden of proof falling on claimants rather than the agency (Shabad, 2026).
The structural resemblance to Robodebt is not coincidence. The Royal Commission into the Robodebt Scheme catalogued, in 990 pages, exactly the failure modes that the HMRC case re-runs: suppressed risk advice, absent pass and fail thresholds, automated outputs treated as authoritative rather than as flags for human review, and accountability that diffused across the chain of decisions until no specific official carried it (Royal Commission into the Robodebt Scheme, 2023). The Commission’s 57 recommendations were designed to install the missing pre-deployment scaffolding. Two and a half years after the final report, the Commonwealth has accepted most of those recommendations in principle. It has not, however, legislated the architectural changes that would make recurrence structurally harder.
The fiscal residue is still visible. Treasury has carried Robodebt-related provisions in successive budgets as an unquantifiable contingent liability, language the May 2026 reporting used to describe ongoing settlements, class-action exposure and remediation costs whose endpoint remains unclear (InnovationAus, 2026). The notional cost of building governance discipline in before deployment is always smaller than the actual cost of building it back in after a Royal Commission. The HMRC case is on the early end of that curve. The Australian case is on the late end. The shape of the curve does not change.
The international comparators reinforce the point. The Netherlands’ System Risk Indication (SyRI) was struck down by the District Court of The Hague on 5 February 2020 on the grounds that the system breached Article 8 of the European Convention on Human Rights, with the court citing both opacity (no insight into the risk indicators or the operation of the risk model) and a failure to balance the privacy intrusion against the public interest in fraud detection (NJCM v The Netherlands, 2020). France’s national family-benefits agency (the CNAF) is now defending its risk-scoring algorithm before the Conseil d’État in a case filed in October 2024 by La Quadrature du Net and a coalition that has since grown to twenty-five civil society organisations; the algorithm assigns a suspicion score that disproportionately flags single-parent households (95 per cent of whom are headed by women) and recipients of low-income or disability allowances for audit (La Quadrature du Net, 2024). Denmark’s Udbetaling Danmark came under formal scrutiny from Amnesty International in November 2024 for a fraud-detection ecosystem that runs more than sixty machine-learning models against personal data drawn from a wider set of public registers than the statutory authority appeared to contemplate (Amnesty International, 2024). The Robodebt Royal Commission examined comparative material from this set of cases. The recommendations it produced were calibrated to a problem that is by now well documented across at least four jurisdictions.
What none of those jurisdictions has yet solved is the procurement layer. Commonwealth procurement rules treat algorithmic decision systems as ICT acquisitions, which means the deliverable specifications focus on functional requirements, integration, security and the standard schedule of contractual protections. There is no equivalent of the structural-engineering certification that a Commonwealth building project would carry. There is no required artefact that documents which decisions the system is authorised to make, what the assumed error rate is, where the escalation triggers sit, or how the system is to be decommissioned if the assumed error rate proves wrong. I have written separately about the Commonwealth AI procurement rule changes earlier this year; the rule-tightening has run ahead of the supplier capability to deliver on the new standards, which is its own gap, but the deeper gap is that procurement still does not ask for the architectural artefacts that would make an automated decision system legible to the agencies operating it.
The UK is at least moving on the legislative piece. Lord Clement-Jones introduced the Public Authority Algorithmic and Automated Decision-Making Systems Bill as a private member’s bill in the 2024-25 session. The Bill completed its Lords stages and finished its Commons stages on 17 March 2026, against a backdrop in which the HMRC case had sharpened parliamentary appetite for the kind of architecture the Bill describes. As drafted, the Bill requires public authorities to conduct algorithmic impact assessments before deploying automated decision systems, to register systems on a public list, and to provide a defined route to challenge automated outcomes. Whether it survives the post-passage period with those teeth intact will turn on how the implementing regulations are written. What is clear is that Australia, with 57 Robodebt recommendations on file, has no equivalent Commonwealth legislation currently in train. The NDIS reform implementation work I have written about elsewhere is running into the same structural problem: an automation review pathway built on top of a service design that does not yet have its own accountability architecture in place.
Pre-deployment accountability is the only intervention that prevents the next iteration of the pattern. It is a less marketable formulation than “responsible AI” and considerably less marketable than the various assurance frameworks already in the literature. The mechanics are unglamorous. A decision-rights map that says, in writing, which official is authorised to suspend payments, which is authorised to override the model output, and which holds the residual accountability for the systemic outcome. An assumption register that captures the error rate the system is designed around and the operational definition of acceptable harm. Escalation triggers that fire automatically when the model output diverges from sampled human review beyond a calibrated tolerance. Model documentation as a procurement deliverable, with the same status as a security certification.
None of these mechanics are mysterious. They are routine in domains where safety-of-life is at stake and where regulators have learnt, expensively, that retrospective accountability is not a substitute for pre-deployment scaffolding. Aviation certification works this way. Pharmaceutical approval works this way. The reason public-sector algorithmic decision systems do not yet work this way is partly capability (the consulting and digital-transformation industry has been slow to develop pre-deployment accountability as a standing engagement type), partly procurement convention, and partly the political incentive to ship a visible deliverable before the governance scaffolding has been built around it.
That last incentive is the hardest to legislate against. It is also the one that keeps producing carbon copies. The HMRC case is not an outlier. It is what happens when an automation programme is delivered on the same schedule as the funding cycle that procured it, with the governance work treated as something that can be retrofitted later if challenged. The Robodebt Royal Commission is the evidence that retrofitting later is the more expensive path. The Lords Bill is the evidence that at least one Westminster jurisdiction has now read that evidence and concluded that legislation is the cheaper option. The work is the same, in the end, whether it happens before deployment, when the system can absorb it, or after a Royal Commission asks why it wasn’t.
References
Amnesty International. (2024). Coded injustice: Surveillance and discrimination in Denmark’s automated welfare state.
HMRC. (2026, January). Evidence of John-Paul Marks (Chief Executive) to the Treasury Select Committee on Child Benefit suspensions. UK Parliament.
InnovationAus. (2026). Robodebt’s unquantifiable contingent liability. May 2026 reporting on the automated welfare compliance saga.
La Quadrature du Net. (2023). Notation des allocataires: l’inacceptable discrimination algorithmique de la CAF.
NJCM v The Netherlands. (2020). District Court of The Hague, Case C/09/550982. Judgment on the System Risk Indication (SyRI).
Royal Commission into the Robodebt Scheme. (2023). Report. Commonwealth of Australia.
Written Evidence to UK Parliament. (2026). HMRC’s Anti-Fraud Intervention on Child Benefit. Zenodo deposit 20205530. https://doi.org/10.5281/zenodo.20205530

Leave a comment