The Hallucination Toll
979 documented cases. AI doesn’t just make things up — it makes things up convincingly enough to cost real money and real freedom.
A lawyer in Florida cites a case that doesn’t exist. A patient follows medical advice that no doctor gave. An investor acts on financial data that was generated, not retrieved. A Colombian attorney faces sanctions for submitting ten fabricated cases and legal norms to a court.
These aren’t hypothetical scenarios. They’re entries in Damien Charlotin’s AI Hallucination Cases Database, which has documented 979 cases of AI hallucinations causing real-world harm. Of those, 798 involved fabricated case law — complete with convincing case names, realistic legal reasoning, and entirely invented outcomes.
The toll is measured in careers and trust. And the problem isn’t getting better fast enough to outpace the adoption.
What the numbers show
A 2024 Stanford study (“Large Legal Fictions,” Stanford RegLab and HAI) found that when asked about a court’s core ruling, large language models hallucinated between 69% and 88% of the time, depending on the model and query type. The models collectively invented non-existent court cases complete with convincing names, realistic legal reasoning, and entirely fabricated outcomes.
The database tracks who made the errors. Of the 979 documented cases, 559 involved pro se litigants — people representing themselves — and 394 involved licensed attorneys. The consequences range from warnings to $1,000 fines to bar referrals. In February 2026 alone, multiple new cases were filed:
Kusmin Amarsingh cited seven fabricated cases from ChatGPT in her appeal against Frontier Airlines (10th Circuit, Case No. 24-1391) — $1,000 fine and a bar referral. Thomas Dodds submitted AI-generated false quotes in Dodds v. Bridges — a warning. An unnamed attorney in Virgil v. Experian submitted 11 fabricated cases and misrepresentations — recommended fine and bar referral. In Colombia, attorney Jorge Hernan Zapata Vargas was fined approximately $6,000 by the Supreme Court for submitting fabricated laws and precedents generated by AI.
The geographic distribution spans at least five countries: 683 cases in the United States, 61 in Canada, 60 in Australia, 42 in Israel, and 39 in the United Kingdom.
The rates
Not all AI systems hallucinate at the same rate. On document summarization tasks, the best models in 2026 have achieved sub-1% hallucination rates — Google’s Gemini-2.0-Flash at 0.7%, OpenAI’s o3-mini-high at 0.8%, according to Vectara’s Hallucination Evaluation Leaderboard.
But those benchmark numbers measure something narrow: whether a model stays faithful to a source document when summarizing it. Open-ended generation — where there’s no source document to stay faithful to — is a different problem entirely. The Stanford legal study found hallucination rates of 69-88% when models were asked about court rulings without a source document to ground them. The gap between “summarization accuracy” and “generation accuracy” is where the real damage happens.
Retrieval-Augmented Generation — a technique that grounds model outputs in external data sources — can significantly reduce hallucination rates. But “properly implemented” is doing a lot of work in that sentence. A Stanford study on legal RAG tools found hallucination rates of 17-33% even with retrieval augmentation. Most consumer-facing AI applications don’t use RAG at all.
The decision problem
The most concerning statistic isn’t the hallucination rate. It’s the decision rate.
The problem isn’t that people know AI hallucinations exist and accept the risk. It’s that they can’t tell the difference. Hallucinated outputs carry the same formatting, the same confident tone, and the same level of detail as accurate outputs. Enterprise users routinely act on AI-generated information without verification — not because they’re careless, but because the outputs look indistinguishable from correct ones.
A fabricated court case comes with the same citation style and level of detail as a real one. A fabricated medical recommendation sounds as confident and specific as an evidence-based one. The output doesn’t come with a reliability score or a warning label. It comes with the same assertive tone as everything else the model produces.
The result is that the burden of verification falls entirely on the user. Every output must be checked against primary sources. Every citation must be verified. Every data point must be confirmed. For the users who do this consistently, AI is a productivity tool. For the users who don’t, AI is a confidence trap.
The liability gap
Courts are beginning to address who’s responsible when AI hallucinations cause harm. The answer, so far, is the user.
Attorneys who submit fabricated case law face sanctions because they have a professional obligation to verify their citations. The fact that AI generated the fabrication doesn’t excuse the failure to check. Courts have been consistent on this point: the lawyer is responsible, not the tool.
But this framework breaks down outside the legal profession. When a patient follows AI-generated medical advice, who’s liable? The patient for trusting it? The platform for generating it? The company that deployed it without adequate safety measures? When an investor loses money on hallucinated financial analysis, is that the investor’s fault for not double-checking every number?
The regulatory framework doesn’t have answers to these questions yet. The EU AI Act, which takes full effect in August 2026, will impose obligations on high-risk AI systems — including those used in healthcare and employment. But hallucination rates, even at their best, are still measured in percentages, not fractions of a percent. A 0.7% hallucination rate across billions of daily interactions is millions of false outputs per day.
What this means
The AI hallucination problem is often discussed as a technical challenge that better models will solve. The data suggests something different: hallucination rates are dropping, but adoption is growing faster than accuracy is improving. The absolute number of hallucination-caused harms is increasing even as the per-interaction rate decreases.
Four models now achieve sub-1% hallucination rates on summarization benchmarks. That’s genuine progress. But the gap between summarization accuracy and domain-specific generation accuracy means the highest-stakes applications — legal, medical, financial — are the ones most likely to produce false information.
The 979 cases in the database are the ones that made it to court or were formally documented. The real number — of medical decisions based on hallucinated advice, financial moves based on fabricated data, hiring decisions based on invented credentials — is unknowable. What’s known is the pattern: more people using AI, in higher-stakes contexts, with the same inability to distinguish real from fabricated.
Originally published at https://noahaust2.github.io/strategist-dashboard/blog/the-hallucination-toll.html
Write a comment