AI & Society

Britannica Is Suing OpenAI. Here's Why It Could Break AI.

Shikhar BurmanShikhar BurmanLinkedIn·June 3, 2026·13 min read

Britannica and Merriam-Webster are suing OpenAI over RAG — the live retrieval behind every AI answer. If they win, AI search changes forever.

Insight

⚡ Published June 3, 2026 — every claim in this article is sourced and verifiable. Key facts: On March 13, 2026, Encyclopædia Britannica and its subsidiary Merriam-Webster filed suit against OpenAI in the U.S. District Court for the Southern District of New York (Case No. 1:26-cv-02097), alleging massive copyright infringement across nearly 100,000 of their copyrighted articles. The complaint targets two things at once: the training data used to build ChatGPT, and — unusually — OpenAI's retrieval-augmented generation (RAG) system, the live process that pulls real web content into answers as you ask. It adds a Lanham Act claim arguing ChatGPT attributes fabricated 'hallucinations' to Britannica by name. OpenAI, valued around $730 billion, responded that its models are 'trained on publicly available data and grounded in fair use.' This is the most important detail most coverage misses: the fair use defense that has won training-data cases does not protect live RAG reproduction nearly as well. That single distinction is why this case could reprice the entire AI search industry.

Encyclopædia Britannica has survived since 1768 — through the printing press, the Industrial Revolution, the CD-ROM, and the rise of Wikipedia. Today it is a digital education platform delivering content in more than 20 languages to over 150 million students worldwide, and its business runs on subscriptions and advertising that depend on people visiting its websites. In March 2026, alongside Merriam-Webster — America's leading dictionary publisher for more than 180 years — it made what may be its most consequential strategic move in decades: a lawsuit against OpenAI. What makes the filing dangerous for the AI industry is not that another publisher is angry about training data. It is the specific legal theory the complaint introduces, one that previous cases largely left untouched, and one that strikes at the part of modern AI that has the weakest legal cover.

Training Data vs RAG: The Distinction That Changes Everything

To see why this case is different, you have to separate two ways an AI uses content. The first is training data — the copyrighted material absorbed into a model's weights while it learns. Almost every major AI copyright fight so far, from The New York Times case to the author class actions, has centered on training. OpenAI's core defense there is fair use: training on copyrighted works to build a genuinely new kind of tool is transformative, the argument goes, much as a person reads thousands of books to develop expertise without licensing each one. That argument has been winning. Courts have repeatedly treated training as highly transformative — which is precisely why a publisher looking to actually stop OpenAI needed a different angle.

The second way AI uses content is RAG — retrieval-augmented generation. When you ask ChatGPT a current question, the system does not rely only on its trained memory; it retrieves live passages from the web (or a curated database) and weaves them into its answer in real time. Britannica alleges that when users ask questions its encyclopedia and dictionary answer well, ChatGPT's RAG system fetches Britannica content and reproduces substantial portions of it in the response — without a license and without sending the user to Britannica's site. The complaint's phrasing is blunt: ChatGPT 'starves web publishers of revenue' by absorbing their content and delivering a polished answer where a search engine would have sent a visitor. This is not a memory-of-training claim. It is live, on-demand reproduction — and fair use is far harder to argue when the AI's output directly substitutes for the original work at the exact moment of use.

The Lanham Act Twist: Suing Over Hallucinations

The complaint then adds a second, genuinely novel theory that most AI copyright suits have not attempted. Alongside four copyright counts, it asserts a Lanham Act claim — false designation of origin and trademark dilution. The argument: when ChatGPT generates fabricated content (a hallucination) and attributes it to Britannica or Merriam-Webster by name, it misleads users into thinking those trusted brands endorsed or authored the false information. Britannica's reputation rests on accuracy built over more than 250 years; associating that name with AI-invented errors, the complaint argues, causes reputational harm that copyright damages alone do not capture. This theory has never been tested in court for AI hallucination. If it succeeds, it creates an entirely new category of liability for any AI system that generates incorrect content while implying an authoritative source — which is to say, essentially all of them.

Where This Case Sits in the AI Copyright War

By April 2026, trackers counted more than 160 active AI copyright lawsuits worldwide. The Britannica filing does not stand alone — it lands inside an already-crowded battlefield, and the outcomes so far cut in different directions. Understanding the landscape is the only way to judge how much weight the RAG theory really carries.

CaseWhat It TargetsStatus / Outcome
Britannica & Merriam-Webster v. OpenAITraining data AND live RAG reproduction; plus a Lanham Act hallucination claimFiled March 13, 2026, SDNY. The RAG and trademark theories are the novel, high-stakes elements.
New York Times v. OpenAI & MicrosoftTraining data and verbatim 'regurgitation' of articles in responsesDiscovery ongoing; summary judgment scheduled for April 2026; NYT seeking billions. Judge Stein declined to dismiss output-reproduction claims.
Bartz et al. v. AnthropicTraining Claude on books, including pirated copiesCourt ruled training is fair use, but storing pirated copies is not. Settled for $1.5 billion — the largest copyright settlement in U.S. history.
Kadrey et al. v. MetaTraining Llama on authors' booksPartial dismissal on fair use for training; claims over pirated 'seeding' remain active in N.D. California.
Thomson Reuters v. Ross IntelligenceUsing Westlaw headnotes to train an AI legal-search toolSummary judgment for Thomson Reuters — not fair use. On appeal at the Third Circuit.
Getty Images v. Stability AITraining an image generator on Getty's photo libraryUK High Court (Nov 2025) rejected the secondary copyright claim — a setback for rights holders abroad.
Britannica & Merriam-Webster v. PerplexityReal-time scraping and reproduction in an AI answer engine (RAG)Filed September 2025, SDNY; still proceeding — the direct precursor to the OpenAI suit.

Read together, the pattern is clear. Where AI companies have been winning is on training as fair use — Anthropic and Meta both got favorable training rulings, even as Anthropic paid $1.5 billion over how it acquired the books. Where they are far more exposed is on output: Judge Sidney Stein, who oversees the consolidated OpenAI litigation in the SDNY, declined to throw out claims that short reproductions in ChatGPT's responses may infringe absent fair use. The Britannica suit is engineered to attack exactly that softer flank — and to do it against RAG, where the reproduction is live, current, and directly competitive with the source.

Why It Could 'Break' AI — and What Each Outcome Means

  • If Britannica wins on the RAG theory: AI companies would likely need to license content before including it in retrieval — creating a new content-licensing market for AI search, much like music streaming licenses songs. That repricing would hit every product built on live retrieval: ChatGPT search, Perplexity, Google's AI Overviews, Gemini, and Copilot. It would not literally 'break' AI, but it could break the current free-content economics that make AI search cheap.
  • If OpenAI wins on fair use: a ruling that RAG retrieval is fair use would validate the architecture of nearly every AI search product and remove the most acute near-term legal risk hanging over the industry. It would also embolden aggressive retrieval, accelerating the publisher-traffic collapse that started the fight.
  • The Merriam-Webster angle is quietly the sharpest: dictionary definitions are among the most frequently retrieved facts in all of AI search. A ruling that definitions must be licensed for retrieval would touch virtually every assistant on the market.
  • Settlement is the most likely ending: given the financial and reputational risk to both sides — and the template set by Anthropic's $1.5 billion deal — a negotiated licensing framework is the probable outcome. If it comes, the terms will tell us more about the future of AI than any verdict would.

This Is a Global Fight, Not an American One

It would be a mistake to read this as a purely U.S. story. Courts around the world are already drawing lines around AI and copyright, and they are not all drawing them the same way. In the UK, Getty's secondary-copyright claim against Stability AI failed in late 2025. In China — the second pole of the global AI race — courts have moved aggressively on AI output. The Hangzhou Internet Court found that an AI platform could be liable for copyright-infringing images its users generated, establishing that platforms cannot dodge responsibility by blaming users, while Beijing courts have issued a series of rulings on whether AI-generated images are even copyrightable. The throughline matters: jurisdictions that are otherwise fierce competitors are converging on the idea that AI output is not automatically exempt from copyright. That is the same nerve the Britannica RAG theory presses — which is exactly why this case is being watched in Silicon Valley, Beijing, Brussels, and every market where AI search is replacing the click.

Insight

The practical takeaway for anyone who relies on AI answers: treat retrieved AI summaries as a starting point, not a citation. The same RAG systems at the center of this lawsuit are why AI answers can confidently attribute a fabricated fact to a real source. When accuracy matters — a definition, a date, a legal or medical detail — use AI to find the source, then verify against the original. Tools like Perplexity surface their citations precisely so you can click through; a tool that gives you a polished answer with no traceable source is the exact behavior Britannica is suing over.

Pro Tip

If you are following AI copyright law, watch the New York Times case more closely than the Britannica case itself. The NYT case is further along — summary judgment was scheduled for April 2026 — and it covers both training and output reproduction. Whatever fair use framework that case establishes will become the boundary inside which the Britannica RAG theory has to operate. The NYT outcome, when it lands, may be the single most consequential legal event for the AI industry to date — and the Britannica suit is the test of how far the resulting rules extend into live retrieval.

Frequently Asked Questions
01What exactly are Britannica and Merriam-Webster suing OpenAI over?

They filed suit on March 13, 2026 in the Southern District of New York alleging OpenAI used nearly 100,000 of their copyrighted articles without permission. The complaint targets both training data and OpenAI's live RAG retrieval system, and adds a Lanham Act claim that ChatGPT misleadingly attributes fabricated 'hallucinations' to their trusted brand names.

02Why is the RAG claim more dangerous to OpenAI than a normal training claim?

Training-on-copyrighted-data has repeatedly been treated by courts as transformative fair use, which is why AI companies tend to win those arguments. RAG is different: it reproduces current content live, at the moment a user asks, and directly substitutes for visiting the source. Fair use is much harder to argue for that kind of competitive, real-time reproduction.

03What is the Lanham Act hallucination theory?

The Lanham Act bars falsely designating the origin of something. Britannica argues that when ChatGPT invents incorrect information and attributes it to Britannica or Merriam-Webster by name, users are misled into thinking those brands endorsed false content — causing reputational harm beyond copyright. It is a novel theory that has not been tested for AI hallucination.

04Could this lawsuit really break AI search?

Not literally, but it could reprice it. If courts rule that content must be licensed before it can be used in retrieval, every product built on live retrieval — ChatGPT search, Perplexity, Google AI Overviews, Gemini, Copilot — would face new licensing costs. That would not end AI search, but it could end the free-content economics it currently runs on.

05How does this compare to the Anthropic and New York Times cases?

Anthropic won the argument that training is fair use but settled for $1.5 billion over storing pirated books — the largest copyright settlement in U.S. history. The NYT case targets both training and verbatim reproduction in responses and reached summary judgment proceedings in April 2026. Britannica's suit is built to attack the output and retrieval side, where AI companies are most legally exposed.

06How is the rest of the world handling AI and copyright?

Differently, and it matters. A UK court rejected Getty's secondary-copyright claim against Stability AI in late 2025, while Chinese courts have held AI platforms liable for infringing output and issued multiple rulings on AI-generated image copyrightability. The shared thread across competing jurisdictions is that AI output is not automatically exempt from copyright — the same principle the Britannica RAG theory is built on.

Was this article helpful?

Found this useful? Share it with someone who needs it.

Free to get started

Claude, GPT-5.4, Gemini —
all in one place.

Switch between 40+ AI models in a single conversation. No juggling tabs, no separate subscriptions. Pay only for what you use.

Start for free No credit card needed
Shikhar Burman
Written by
Shikhar BurmanLinkedIn

Co-Founder and CTO of LumiChats. Writes technical deep-dives on AI systems, infrastructure, and how large language models actually work under the hood.

Keep reading

More guides for AI-powered students.