Hallucination abstract imagery
← All posts

What are AI hallucinations, actually?

"Hallucination" gets used three different ways in the same meeting: as a punchline, as a vague liability, and as the reason a colleague got an uncomfortable letter from a court. Since we spend our days looking at the things, here's the explainer we wish we'd had — what they are, why they happen, and what they tend to look like in documents where mistakes cost something.

A confident answer to a question nobody asked the model to get right

A hallucination is output from a generative model that's presented as fact but isn't grounded in anything real. A case that doesn't exist. A statistic no study produced. A quote nobody said. What makes it dangerous isn't that the model is wrong — people are wrong constantly — but that it's wrong while sounding fluent, specific, and formatted exactly like being right.

The word itself is a bad metaphor. Hallucinating implies a malfunction, a system that normally perceives reality suddenly perceiving something false. Language models don't perceive reality at all. They predict plausible text, one token at a time, based on patterns in their training data. When a model writes a real citation it's doing exactly the same thing as when it writes a fake one: producing something shaped like the citations it has seen. Truth was never an input. Some researchers prefer "confabulation," the clinical term for fluently inventing plausible memories with no intent to deceive, and honestly it fits better. The model isn't lying and it isn't broken. It's doing what it was built to do, on a question it was never built to answer.

That's also why this isn't a bug the next model version will fix. Newer models hallucinate less, and grounding them in real documents helps a lot. But a system whose core operation is "generate plausible text" will always be able to generate plausible falsehoods. The rate goes down; the category stays. And there's a catch: as the models improve, the surviving fabrications get more convincing. The obvious tells of 2023 are mostly gone. The fabrication isn't.

What they look like in practice

Most of what we see falls into a few recurring shapes.

The phantom. A source that simply doesn't exist — an invented case, a fake paper, a standard nobody wrote. This is the famous one, and the easiest to catch if anyone looks. In the Srinagar judgment we wrote about last week, a High Court went looking for one of the two main precedents in a lower court's order and concluded the cited judgment "does not appear to exist."

The mismatch. Everything in the reference is individually real, but the pieces don't belong together. The case name is genuine and the citation points somewhere else; the DOI resolves to an unrelated paper. Sneakier than the phantom, because the lazy check ("does this case exist? yes, fine") passes. The other flagged authority in that same order was exactly this.

The misquote. Source exists, citation is correct, and the source doesn't say what the document claims it says. Somewhere between the actual holding and the argument being built, the proposition drifted into the holding-shaped sentence that fit best. This is the hardest type to catch, because it survives the verification step most people actually do — confirming the source is real — and only fails when someone reads the source against the claim. The Srinagar court told judges to quote passages verbatim rather than paraphrase, and this is why. Paraphrase is where drift hides.

The invented fact. A precise-sounding number, date, or threshold with nothing behind it. These turn up constantly in summaries, where the model fills gaps in the source material with plausible filler. "Compliance costs rose 23%" is exactly as easy to generate when no such figure exists as when one does.

The helpful agreement. The subtle one. Ask a model to "summarise the cases supporting our position" and it may manufacture the support, because the prompt assumed support existed. You supplied the conclusion; it supplied the evidence.

Why "just check everything" keeps not happening

Every professional body's guidance lands on the same instruction: a human verifies all AI output against real sources. Correct, obviously. But notice what it asks. Tracing every reference, confirming the match, reading the passage against the claim — done properly, that can take longer than the drafting time the AI saved, which quietly kills the reason the tool was adopted. People don't skip verification because they're careless. They skip it because verification consumes exactly the hours the tool was meant to give back.

It doesn't help that we calibrate trust on surface signals. Confidence, specificity, clean formatting — a fabricated citation in perfect house style feels checked. Reviewers chase down the claims that look shaky and wave through the ones that look authoritative, which is precisely backwards here.

And there's the problem the Srinagar case makes vivid: you usually can't tell whether AI was involved at all. A mistyped citation and a fabricated one look identical on the page. Once these tools are plausibly anywhere in the workflow — yours, your associate's, the other side's — origin stops mattering. The only thing you can actually observe is whether a claim was verified or not.

That last point is, more or less, why our company exists: verification only happens reliably when it's cheap enough to be the default. But that's a different post. The point of this one is simpler. Hallucinations aren't an embarrassing phase the technology is growing out of. They're the standing cost of systems that produce plausibility rather than retrieve truth — and the bill lands hardest in professions where a confident falsehood has consequences.