A Princeton-led study found that three plain editing moves, adding statistics, citing sources, and adding quotations, raised a page's visibility inside AI-generated answers by up to 40%. Getting cited by an AI engine is not a secret algorithm, and it is not the same game as ranking on Google. There is also one fact most guides skip that quietly decides everything: the AI crawlers that feed ChatGPT, Perplexity, and Google AI Overviews do not run JavaScript. If a fact is not in your raw HTML, the model never sees it. This guide is the verified, durable version of how to be the source an AI quotes.
Quick takeaways: the top three moves that earn AI citations are adding statistics, citing primary sources, and adding quotations (Princeton GEO study, up to 40% visibility gain). AI crawlers do not execute JavaScript, so server-render your facts. You no longer need to rank in the top 10 to be cited. And llms.txt, despite the hype, is not used by the major answer engines yet, so treat it as low-cost housekeeping, not a ranking lever.
What getting cited actually means now
Search is splitting into two jobs. The old job is ranking: appearing as a blue link a human clicks. The new job is being cited: appearing as a source an AI engine reads, summarizes, and credits inside its answer. The second job has its own name, or rather several. Generative Engine Optimization (GEO) is the academic term, from the Princeton-led paper that coined it. Answer Engine Optimization (AEO) and LLM SEO are the industry terms for the same practice. There is no agreed line between them, and they are used interchangeably. GEO is the only one with a peer-reviewed origin, which is why it is worth anchoring on what that study actually measured rather than on vendor folklore.
Why this matters even if you are skeptical of the hype: AI answers are eating the top of search. Industry trackers put US generative-AI-search usage at a large and growing share of the population, and Google AI Overviews reach a vast monthly audience. The page that gets quoted in the answer, not the page sitting at position 3 of the blue links, is the one that gets seen. Optimizing to be quoted is the durable skill.
The Princeton study: what actually moved the needle
The GEO paper built a benchmark of 10,000 real queries across 25 domains and tested nine editing strategies to see which made a page more likely to be surfaced and credited inside a generative answer. The result is refreshingly unglamorous. The winners were not clever keyword tricks. They were the habits of good, trustworthy writing.
| Editing move | Effect on AI visibility | Why it works |
|---|---|---|
| Add statistics | Among the top methods, roughly 30 to 40% relative gain | Concrete numbers read as authoritative and quotable |
| Cite primary sources | Top method; largest gain for lower-ranked pages | Models prefer claims that trace to evidence |
| Add quotations | Around 22% on the Perplexity real-world test | Quotes give the model a ready-made citable unit |
| Keyword stuffing | Underperformed | Adds no information the model can use |
| Fluency-only polish | Weak | Reads nicer but adds nothing extractable |
Two details from the study are worth holding onto. First, on a live test against Perplexity, adding statistics produced the strongest lift, in the high thirties of a percent, and adding quotations produced a smaller but real gain. Second, and most encouraging, the lower-ranked your page is, the more these moves help. The paper reports that a page sitting around position five could see a large relative visibility jump from the cite-sources method alone. GEO rewards substance, and it rewards the underdog who adds it.
The one-line version of the research: write the way a good encyclopedia entry reads. Lead with the fact, attach a number, name the source. That is not a trick. It is what a model is built to trust and repeat.
The fact most guides miss: AI crawlers do not run JavaScript
You can do everything above and still be invisible if your content only appears after the page runs its JavaScript. Vercel analyzed the major AI crawlers and found none of them render JavaScript. GPTBot, ClaudeBot, and PerplexityBot fetch the raw HTML and stop. Anything injected by client-side code, a price that loads after the fact, an FAQ that expands on click, a spec table built by a script, is simply not there as far as the model is concerned.
The two-second test: open your page, disable JavaScript in the browser, and reload. Whatever disappears is invisible to AI engines. Put your key facts, prices, FAQ answers, and structured data in the server-rendered HTML so they survive the test.
You do not need to rank number one anymore
Here is the shift that changes the math. AI engines cite well beyond the classic top ten. Ahrefs, analyzing a large set of AI Overviews, reported that fewer than 40% of AI Overview citations come from pages ranking in the top ten, a share that has been falling over time. The answer layer is pulling sources from deeper in the results than traditional search ever did. For a smaller site, that is the opening: you can be cited for a specific, well-evidenced answer without first winning the brutal fight for a top-three blue link.
llms.txt: the honest version
You will be told to add an llms.txt file. Here is the accurate picture. llms.txt is a proposed standard from Jeremy Howard of Answer.AI, published in September 2024: a Markdown file at the root of your site that gives an LLM a curated, machine-readable index of your important pages. The idea is sound and the file is cheap to add. What it is not, yet, is a ranking signal.
- Google does not use it. John Mueller of Google said the major AI services do not even check for the file in server logs, and compared it to the long-discredited keywords meta tag.
- It did not move visibility in testing. Search Engine Land tracked ten sites for 90 days; eight saw no measurable change, and the movements on the rest were not attributable to llms.txt.
- The recent Chrome Lighthouse check for llms.txt sits in an agentic-browsing category, which is about helping agents crawl efficiently, not about ranking. Google has said you do not need new machine-readable files to appear in AI search.
So add llms.txt if you maintain docs and want agents to navigate them cleanly. It is harmless and occasionally helpful for tooling. Just do not expect it to get you cited. The citation comes from the content itself.
The durable checklist
None of this expires when the next model ships, because it is about being genuinely useful and genuinely machine-readable. Work this list and you are optimizing for every answer engine at once.
- Answer first. Put the direct answer in the opening lines of the page and of each section. Models lift the part that answers the question, so make it easy to find.
- Attach a number. Replace vague claims with a specific statistic, and state where it came from.
- Cite primary sources by name. Link the study, the doc, the original, not a roundup of a roundup.
- Use quotable units. Short, self-contained sentences carry a clear fact a model can quote without context.
- Server-render the facts. If it is not in the raw HTML, it does not exist to a crawler.
- Add structured data. Schema for author, date, FAQ, and article type helps a parser label what your page is about.
- Keep it fresh. AI answers lean toward recently updated content, so revisit and date your cornerstone pages.
- Write for one clear entity. Be unambiguous about who you are and what the page is about, so the model represents you correctly.
There is a bigger pattern underneath all of this. The reader of your content is no longer only a human scrolling; it is increasingly a model parsing. The same discipline applies to private knowledge, which is the problem a memory layer like MemX solves: it keeps your own documents, notes, and decisions in a form an AI can actually retrieve and answer from. Public content optimizes to be cited by the world's models. Your private knowledge deserves the same machine-readable discipline, just pointed at you.
Key takeaway: you get cited by AI engines the same way you earn trust from a careful editor. Lead with the answer, back every claim with a number and a named source, and make sure all of it is in the raw HTML. The tactics that chase algorithms age out. Substance does not.
01What is Generative Engine Optimization (GEO)?
GEO is the practice of writing and structuring content so that AI engines like ChatGPT, Perplexity, and Google AI Overviews cite it in their answers. The term comes from a Princeton-led paper that found adding statistics, citing sources, and adding quotations raised a page's visibility in AI answers by up to 40%.
02How is GEO different from SEO?
SEO optimizes to rank as a clickable link. GEO optimizes to be quoted as a source inside an AI-generated answer. They overlap, but GEO rewards extractable substance (numbers, citations, clear answers) and is less dependent on ranking in the top ten, since AI engines cite pages from well beyond it.
03Does adding an llms.txt file help me rank in AI search?
Not currently. llms.txt is a useful, low-cost convention for helping agents navigate your docs, but the major answer engines do not use it as a ranking signal. Google has said it does not even check for the file and likened it to the discredited keywords meta tag. Focus on the content itself.
04Why is my content not showing up in AI answers?
A common and overlooked reason is JavaScript. The major AI crawlers do not render JavaScript, so any content injected by client-side code is invisible to them. Load your page with JavaScript disabled; whatever disappears cannot be cited. Server-render your key facts and structured data.
05Do I need to rank number one to be cited by AI?
No. Analysis of AI Overviews found fewer than 40% of citations come from pages ranking in the top ten, a share that has been falling over time. A well-evidenced answer on a specific question can be cited without first winning a top-three position.
