Glossary/Hallucination

Definition

AI hallucination is when a language model generates information that is factually incorrect, fabricated, or not grounded in the provided source material — but presents it with the same confident, fluent tone as accurate information. Hallucination is an intrinsic property of current LLMs, not a bug that can be fully fixed, but it can be significantly reduced with the right techniques.

Why hallucination happens — the real explanation

LLMs are not knowledge retrieval systems — they are learned probability distributions over token sequences. At every generation step, the model computes:

The model picks the next token by sampling from this probability distribution. h_t is the hidden state; W_U is the unembedding matrix; T is temperature. There is no "fact-check" step — only statistical plausibility.

When asked about a low-probability or out-of-distribution topic (rare statistics, obscure papers, recent events), the model generates what a plausible response would look like given its training distribution — not what is actually true. The model has no epistemic awareness: it cannot distinguish 'I know this' from 'this sounds like what the answer would look like'.

The confident wrong answer

Hallucinations are worst when the question is well-posed (the model "knows" the format of a correct answer) but the specific content is outside the training distribution. A model confidently fabricating an APA citation looks exactly like a real citation — this is why hallucination is especially dangerous for academic and professional use.

Types of hallucination

TypeDescriptionExampleDanger level
Factual hallucinationWrong dates, statistics, names"Einstein won the Nobel in 1925" (actually 1921)Medium — verifiable
Citation hallucinationInvented papers with real-sounding metadata"Smith et al. (2019), Nature, p.42" — paper doesn't existHigh — hard to detect without library access
Contextual hallucinationContradicts information given in the promptDocument says "Q3 revenue was $5M"; model says $8MHigh — trust-breaking
ConfabulationInternally consistent but entirely fabricated storyDetailed biography of a person who doesn't existVery high — very convincing
Action hallucinationClaims to have done something it didn't do"I searched the web and found..." (no tool was called)Medium — workflow-breaking
Package hallucinationInvents library functions/APIs that don't existimport pandas as pd; pd.read_json_fuzzy()Medium — breaks code

Phantom imports in code generation

Studies have found that up to 20% of LLM-generated Python packages in code completions are hallucinated — the package name looks plausible but doesn't exist on PyPI. Always run pip install and unit tests before deploying AI-generated code.

Hallucination rates by model and task

Task typeHallucination riskBest mitigation
Simple factual Q&A on famous topicsLow (frontier models)None needed for well-known facts
Specific citations / paper referencesVery High (all models)Always use RAG or scholarly databases
Medical / legal specific claimsHigh — dangerousRAG + human expert review required
Code generation (popular libraries)Low–MediumRun tests; check API signatures
Code generation (niche/new libraries)High (phantom imports)Always verify against official docs
Recent events (post-cutoff)Very HighEnable web search tools
Mathematical proofsMedium (subtle errors)Verify with CAS or formal checker

RAG dramatically reduces hallucination for document-grounded tasks. Studies show hallucination rates dropping from ~30% for open-ended GPT-4 to ~5% when the model is provided source documents and asked to cite them. However, models can still hallucinate by misquoting or contradicting provided documents (contextual hallucination), especially in long contexts.

How to detect and reduce hallucination

No foolproof hallucination detector exists — but several effective strategies reduce both occurrence and impact:

  • RAG (Retrieval-Augmented Generation) — ground every answer in source documents retrieved at query time. Force citation of specific pages/chunks. If the answer isn't in the retrieved context, the model should say so.
  • Chain-of-thought prompting — asking models to reason step-by-step reduces hallucination by exposing the reasoning chain where inconsistencies become visible.
  • Consistency sampling — generate k responses independently and look for agreement. If 4/5 answers agree, confidence is higher. Divergent answers flag uncertainty.
  • Tool use — give models access to a calculator, code interpreter, or web search to verify claims externally rather than relying on parametric memory.
  • Confidence calibration — prompt the model to explicitly rate its confidence and to say "I don't know" when uncertain. Well-calibrated models (Claude, GPT-4) do this reasonably.

Self-consistency decoding to detect hallucinations

import anthropic
import re
from collections import Counter

client = anthropic.Anthropic()

def self_consistency_check(question: str, n_samples: int = 5) -> dict:
    """
    Generate n independent answers and measure agreement.
    High variance across answers signals uncertain / hallucination-prone territory.
    """
    answers = []
    for _ in range(n_samples):
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=200,
            temperature=0.8,   # non-zero to get variation
            messages=[{"role": "user", "content": question}]
        )
        answers.append(response.content[0].text.strip())

    # Rough agreement metric: count unique answers
    unique = len(set(answers))
    agreement = 1 - (unique - 1) / max(n_samples - 1, 1)

    return {
        "answers": answers,
        "agreement_score": round(agreement, 2),
        "confidence": "high" if agreement > 0.8 else "uncertain",
        "note": "Low agreement → model is uncertain → verify externally"
    }

result = self_consistency_check(
    "What was Claude Shannon's exact PhD thesis title?",
    n_samples=5
)
print(f"Agreement: {result['agreement_score']} — {result['confidence']}")

Why it's especially dangerous for students

Students face a unique hallucination risk: they may lack the domain expertise to recognize when an AI's answer is wrong. The AI's confident, well-formatted output can be indistinguishable from accurate information — even to instructors.

ScenarioRiskConsequence
Citing AI-generated referencesVery HighAcademic misconduct + failed assignment
Medical question (diagnosis/dosage)CriticalDirect patient harm if acted upon
Legal question (case law)HighWrong legal strategy; citation of non-existent case law
Math problem solvingMediumPlausible-looking but wrong derivation
Historical dates / attributionsMediumWrong facts in essays or exam answers

Document-grounded AI is safer

Tools that retrieve answers from your actual uploaded textbook and cite the exact page number — like RAG-based study assistants — are dramatically safer for academic work than open-ended chat. The model cannot make up a page reference from your specific edition of your specific textbook.

Practice questions

  1. What are the three types of LLM hallucination and examples of each? (Answer: (1) Factual hallucination: stating false facts confidently — inventing citations, wrong dates, non-existent people. Example: 'The Battle of Hastings was in 1067' (correct: 1066). (2) Faithfulness hallucination: generating content not supported by provided source documents. Example: summarizing a document and adding claims not in the original. (3) Reasoning hallucination: logical errors in chain-of-thought. Example: 'All dogs are mammals. Fido is a mammal, therefore Fido is a dog.' Factual and faithfulness hallucinations are most studied; reasoning hallucinations are an active research area.)
  2. What is the 'sycophancy' problem in LLMs and how does it relate to hallucination? (Answer: Sycophancy: LLMs agree with user beliefs even when those beliefs are factually incorrect. 'Einstein failed math as a child' (false) — if the user states this confidently, sycophantic models agree rather than correcting. Sycophancy is a form of hallucination: the model generates false content to satisfy perceived user preferences rather than being accurate. Root cause: RLHF training on human preferences, where humans often preferred agreeable responses over correct-but-disagreeable ones. Mitigation: RLHF with honesty metrics, Constitutional AI, debate training.)
  3. What is RAG (Retrieval-Augmented Generation) and how does it reduce hallucination? (Answer: RAG grounds LLM generation in retrieved documents: search a knowledge base for relevant passages, inject them into the context, instruct the model to answer only from provided sources. Reduces hallucination because: the model has explicit evidence to cite; the prompt can include 'answer only from the provided documents — say I don't know if not covered.' Remaining risks: faithfulness hallucination (misrepresenting what the document says), retrieval failures (relevant doc not retrieved), and models still hallucinate when instructed poorly.)
  4. How do you evaluate hallucination in LLM outputs? (Answer: (1) FactScore (Min et al. 2023): decompose response into atomic claims, verify each claim against a knowledge source (Wikipedia). Report percentage of verifiable claims that are true. (2) RAGAS: evaluates RAG faithfulness — does the answer follow from the retrieved context? (3) TruthfulQA benchmark: tests model on questions where common misconceptions exist — does the model give the true or popular-but-false answer? (4) Human evaluation: domain experts check specific claims against authoritative sources. FactScore is the current standard for open-domain hallucination evaluation.)
  5. What techniques reduce hallucination at inference time without retraining? (Answer: (1) Temperature=0: greedy decoding reduces creative fabrication. (2) Self-consistency: sample N responses, take majority vote — inconsistent claims are likely hallucinations. (3) Chain-of-thought: making the model reason step-by-step exposes errors before the final answer. (4) Cite-then-answer: require the model to quote specific sources before stating facts. (5) Uncertainty elicitation: prompt 'If unsure, say so' — trains output style to include appropriate hedging. (6) Knowledge boundary prompts: 'Only answer if you are certain this fact is in your training data.')

Famous real-world AI hallucinations: a field guide

Hallucinations are not theoretical. These documented cases shaped how developers, enterprises, and regulators now think about AI reliability.

CaseYearWhat happenedConsequenceLesson
Mata v. Avianca (legal brief)2023Lawyer filed a brief citing 6 cases ChatGPT invented — complete with fake judges, courts, and opinionsLawyer sanctioned $5,000; public humiliation; landmark AI-in-court precedentNever cite AI-sourced legal cases without manual verification in official legal filings
Air Canada chatbot refund policy2024Air Canada's AI support bot told a grieving customer he could claim a bereavement discount retroactively. Company later said the bot was wrong and it wasn't liable for the AI's statementsCourt ruled Air Canada liable; ordered to honour the discount; $650 AUD payoutOperators are legally liable for their AI's factual claims to customers
Google Gemini historical figures2024Gemini generated images of Black and Asian Nazi soldiers when asked to depict historical figures, applying diversity to contexts where accuracy was requiredGoogle pulled image generation for months; significant reputational damageGenerative AI systems need domain-aware guardrails, not just general diversity training
ChatGPT medical misinformation2023Studies found GPT-3.5 gave incorrect medication dosages and fabricated drug interactions in ~21% of pharmacology queriesMultiple papers published; FDA began monitoring; warned against using LLMs for prescribingMedical applications require RAG from authoritative clinical databases, not parametric memory
Hallucinated academic citationsOngoingStudies find 30–50% of AI-generated reference lists contain completely invented papers with plausible authors, titles, and journalsEssays and papers submitted with fake references; academic integrity cases at universities globallyAlways verify every AI-generated citation against a real scholarly database
Amazon hiring tool bias (discriminative)2018Amazon's ML hiring tool gave lower scores to resumes containing the word "women's" — hallucinating discriminatory patterns from biased training dataSystem scrapped; influenced EU AI Act requirements for HR AI auditingAI can hallucinate social patterns from skewed training distributions, not just facts

The academic integrity crisis

A 2024 Stanford study found that 14% of undergraduate submissions at major US universities contained AI-generated content, with detectable hallucinations in roughly 23% of those. The problem is not that students used AI — it's that they submitted AI hallucinations as facts. The most common failure: AI-generated references that don't exist. If you use AI for research, verify every single factual claim and citation against the original source.

Model familyHallucination rate (TruthfulQA 2025)Citation fabrication riskMedical accuracyNotes
GPT-4o~82% truthfulMedium (~15% of citations)Good with groundingBest general-purpose; still fabricates in niche domains
Claude 3.7 Sonnet~85% truthfulLow-medium (~10%)Strong with RAGConstitutional AI training reduces confident falsehoods
Gemini 1.5 Pro~80% truthfulMediumGood with search groundingStrong when Search grounding is enabled
DeepSeek-V3~78% truthfulMedium-highVariesWeaker on English-language factuality vs top US models
LLaMA 3 70B (base)~72% truthfulHigh without RAGRequires groundingBase model — fine-tuned versions perform better
Smaller models (<7B)50-65% truthfulVery highUnreliableNever use for factual applications without RAG+grounding

On LumiChats

LumiChats Study Mode virtually eliminates hallucination for document-based questions by using RAG: every answer is grounded in your uploaded textbook, the model is instructed to cite the page number, and it cannot draw on outside knowledge.

Try it free

✦ Under $1 / day

Practice what you just learned

Quiz Hub + Study Mode lock in every concept. 40+ AI models, Agent Mode, page-locked answers — all for less than a dollar a day.

Start Free — Under $1/day

Related Terms

5 terms