AI & SocietyShikhar Burman·23 March 2026·11 min read

Britannica and Merriam-Webster Are Suing OpenAI: What the RAG Copyright Case Means for AI's Future

Encyclopedia Britannica and Merriam-Webster filed suit against OpenAI in March 2026 — and unlike previous AI copyright cases, this one targets RAG (Retrieval-Augmented Generation) directly. The lawsuit alleges that ChatGPT reproduces Britannica's copyrighted content through its RAG workflow when generating responses, not just through training data. This could be the most consequential AI legal case of 2026. Here is exactly what was filed, why RAG specifically is at issue, and what the outcome could mean for every AI company.

The Encyclopedia Britannica has survived since 1768 — through the printing press, the industrial revolution, the encyclopedia on CD-ROM, and the rise of Wikipedia. In March 2026, the company filed what may be its most consequential strategic decision in decades: a lawsuit against OpenAI that targets not the training of AI models, but the retrieval systems that power AI responses. Joined by Merriam-Webster (the most authoritative American dictionary, continuously published since 1828), the lawsuit introduces a legal theory that distinguishes it from all previous AI copyright cases and could fundamentally reshape how AI search and retrieval systems are built and licensed.

The Critical Distinction: Training Data vs. RAG

To understand why this lawsuit is different, you need to understand the difference between two ways AI uses content. Previous major AI copyright lawsuits — the New York Times case, the author lawsuits — primarily target training data: the copyrighted content used to train the model's weights during the learning process. OpenAI's main defense has been fair use: the argument that training on copyrighted content to build a new type of product is transformative use, similar to how a human reads books to develop expertise without licensing each one. Courts have not yet resolved this argument definitively.

The Britannica/Merriam-Webster lawsuit targets something different: RAG. Retrieval-Augmented Generation is a system where, when a user asks ChatGPT a question, the model dynamically retrieves passages from current web content (or a curated document database) and incorporates those retrieved passages into its response. The lawsuit alleges that when users ask ChatGPT questions that Britannica articles answer well, ChatGPT's RAG system retrieves Britannica content from the web and reproduces substantial portions of it in its responses — without a license and without directing users to Britannica's site. This is not training data copyright — it is live content reproduction, and the fair use argument is significantly harder to make for live reproduction than for training data use.

The Lanham Act Claim: Hallucinations Attributed to Britannica

The lawsuit includes a second, unusually creative legal theory: a Lanham Act claim based on hallucinations. The Lanham Act prohibits false designation of origin — essentially, falsely attributing content to a source. The plaintiffs allege that ChatGPT sometimes generates factually incorrect content and presents it as if it came from authoritative sources like Britannica and Merriam-Webster — creating the impression that Britannica endorsed the hallucinated content. This theory has not been tested in court for AI hallucination, and its success would create a novel liability framework for AI companies whose systems generate incorrect content while implying authoritative sourcing.

How This Case Fits Into the Growing Publisher vs. AI Litigation Landscape

  • New York Times v. OpenAI (ongoing): the NYT alleges both training data use and that ChatGPT reproduces NYT articles too closely in responses. OpenAI's main defense is fair use plus the argument that any close reproduction is a model bug, not an intended feature.
  • Ziff Davis and newspaper coalition lawsuits: a coalition of news publishers filed a coordinated set of lawsuits against AI companies in late 2024, all targeting training data use. These cases follow a similar theory to the NYT case.
  • The Author Guild cases: George R.R. Martin, John Grisham, and other prominent authors filed suit over training data use of their works. These cases address the creative writing domain rather than factual reference content.
  • What distinguishes the Britannica/Merriam case: all previous major cases focus primarily on training data. Britannica's RAG theory is new — it targets the live inference process, which has no good fair use defense if the reproduction is substantial. If the RAG theory succeeds, it could require AI companies to license content for retrieval use separately from training use.

What the Outcome Could Mean for AI Products

  • If Britannica wins on the RAG theory: AI companies may be required to license content from publishers before including it in RAG retrieval databases. This would create a new content licensing market for AI retrieval — similar to how music streaming services license music — and could significantly increase the cost structure of AI search products.
  • If OpenAI wins on fair use: a ruling that RAG retrieval constitutes fair use would validate the current architecture of most AI search products and resolve the most acute near-term legal risk for AI companies that use live web retrieval.
  • Likely settlement territory: given the reputational and financial risks of an adverse ruling for both sides, settlement is the most probable outcome. The settlement terms — particularly whether they involve a licensing payment or a content licensing framework — will be more revealing than the outcome would be if the case went to verdict.
  • The Merriam-Webster dimension: Merriam-Webster's claim is particularly interesting because dictionary definitions are among the most frequently retrieved factual content in AI search responses. A finding that AI companies must license dictionary content for retrieval use would affect virtually every major AI search product.

Pro Tip: For anyone following AI copyright law: the most important case to track in 2026 is not the Britannica case itself but the New York Times case, which will likely be resolved first and establish the fair use framework that all subsequent cases will be evaluated against. The NYT case covers both training data and response reproduction — its outcome will define the boundaries within which the Britannica RAG theory must operate. The NYT case verdict, when it comes, will be the most consequential single legal event for the AI industry since its founding.

Ready to study smarter?

Try LumiChats for ₹69/day

40+ AI models including Claude, GPT-5.4, and Gemini. NCERT Study Mode with page-locked answers. Pay only on days you use it.

Get Started — ₹69/day

Keep reading

More guides for AI-powered students.