llms.txt Explained: The File AI Engines Read

Most marketing teams have heard of robots.txt. A lot fewer have heard of llms.txt. That gap is the entire opportunity. While teams pour budget into traditional SEO, a tiny file at the root of your domain is quietly becoming one of the highest-leverage signals for getting cited by ChatGPT, Perplexity, Claude, and Google's AI Overviews. It costs almost nothing to ship and almost no competitor has one yet.

This guide is the technical walkthrough we wish existed when we started building llms.txt files for client sites. What it is, where it came from, what to put in it, how it differs from the files you already have, and how to validate it. If you want the broader strategic context first, read our pillar guide on what AEO actually is. Otherwise, let's get into the file itself.

What llms.txt Is, Exactly

llms.txt is a Markdown file you place at the root of your domain. The path is always yoursite.com/llms.txt. It gives large language models a structured, hand-curated map of the most important content on your site, formatted in a way the model can parse in a single pass without crawling.

The format is opinionated. An H1 with the site or product name. A short blockquote summarizing what the site is. Then a series of H2 sections (Docs, Pricing, Case Studies, About, FAQ) that group your most important URLs, each with a one-line description. The whole file is usually 50 to 200 lines.

The point is curation, not coverage. sitemap.xml lists every page. llms.txt lists the pages that matter and tells the model what to read first. When ChatGPT or Perplexity is deciding whether to cite your site in an answer, llms.txt is the cheat sheet that tells it where to look.

Where llms.txt Came From

The proposal came from Jeremy Howard (co-founder of fast.ai and Answer.AI) in September 2024. The motivation was simple. Reasoning models were burning tokens trying to scrape and parse arbitrary websites, and the most useful pages on most sites were buried in nav menus, JavaScript, and analytics tags. A simple, predictable file at a known location would let models skip the noise.

By early 2026, support spread fast. Anthropic, OpenAI, and Perplexity have all signalled they parse llms.txt when present. The convention is not yet a ratified standard, but neither was robots.txt for a long time, and that didn't stop it from becoming load-bearing infrastructure. Anyone in technical SEO who waited for robots.txt to be official lost a decade of competitive advantage.

The File Format, Step by Step

llms.txt is intentionally simple Markdown. Here is a minimal example for a SaaS product:

# Meridian15 > A creative and performance marketing studio building brands, sites, and AI-discoverable content systems for DTC and B2B teams. ## Services - [Branding](https://fifteenthmeridian.com/services/branding.html): Positioning, identity, voice, and brand systems. - [SEO and AEO](https://fifteenthmeridian.com/services/seo-aeo.html): Traditional search and AI Engine Optimization. - [Performance Ads](https://fifteenthmeridian.com/services/performance-ads.html): Paid media strategy and execution. ## Case Studies - [Galaxy Lamps](https://fifteenthmeridian.com/case-studies/galaxy-lamps.html): DTC brand scaling story. - [Ridgeline Brewing](https://fifteenthmeridian.com/case-studies/ridgeline-brewing.html): Local brand redesign and rollout. ## About - [About Meridian15](https://fifteenthmeridian.com/about.html): Team, philosophy, offices. ## Optional - [Blog](https://fifteenthmeridian.com/blog.html): Writing on branding, AEO, and performance marketing.

That is the whole format. An H1, a blockquote, H2 sections, bulleted links with one-line descriptions. The optional H2 at the bottom signals to the model which sections it can skip if it is operating under a token budget.

How llms.txt Differs From robots.txt and sitemap.xml

This is where most teams get confused. The three files do different jobs and you need all three. They are not substitutes.

robots.txt: The Permissions File

robots.txt tells crawlers what they can and cannot access. It is binary. Allow or disallow. It does not describe content, it does not rank pages, it just gates access. For AI crawlers specifically, you should explicitly allow GPTBot, ClaudeBot, PerplexityBot, and Google-Extended in your robots.txt if you want to be cited. Many sites have these blocked by default and never check.

sitemap.xml: The Discovery File

sitemap.xml is the comprehensive list of every URL on your site. It exists so crawlers (search and AI alike) can find every page without relying on internal links. It tells crawlers what exists. It does not tell them what matters.

llms.txt: The Curation File

llms.txt is the editorial layer on top. It says: here is what this site is, here are the pages that actually matter, here is the order to read them in, and here is a one-line description of each so you can decide what to cite. It is the only one of the three that is written for a reasoning model rather than a crawler.

If robots.txt is the door and sitemap.xml is the table of contents, llms.txt is the executive summary.

What to Put in llms.txt for Different Site Types

SaaS or Product Sites

Lead with what the product does (the blockquote), then sections for Docs, Pricing, Case Studies, Changelog, and About. Documentation is the highest-value section because it is what AI engines cite most when answering "how do I do X with Y product." If your docs are buried two clicks deep, llms.txt is how you surface them.

Editorial or Publisher Sites

Lead with what the publication covers, then sections for the major topic areas (each with 3 to 5 representative articles), Author pages, and About. Author pages are critical here because they are the primary E-E-A-T signal AI engines use to weight whether to cite a piece.

Agency or Service Business Sites

Lead with what the agency does and who it serves, then sections for Services, Case Studies, About, and Locations (if relevant). Case studies are the citation-bait. They are concrete proof of work and AI engines pull from them disproportionately when answering "best agency for X" type queries.

Ecommerce Sites

Lead with what the store sells and the brand promise, then sections for Top Categories, Bestsellers, Brand Story, and Customer Support. Skip the long tail of product URLs. The point is to give the model the brand context, not the catalog.

The Common Mistakes

Treating It Like sitemap.xml

The biggest mistake is dumping every URL on the site into llms.txt. The whole point is curation. If you have 4,000 pages, llms.txt should reference maybe 30 of them, organized into clear sections. The rest belong in sitemap.xml where they always belonged.

Skipping the Descriptions

A bare list of links is half the value. The one-line descriptions are what tell the model whether to follow a link or move on. "Pricing" is useless. "Pricing: tiered SaaS pricing with annual and monthly options, free trial available" is useful.

Stale Links

llms.txt is part of your shipping pipeline now. Every time you delete or rename a URL, you need to update llms.txt. A 404 in llms.txt is worse than no llms.txt at all because it tells the model your editorial layer is unreliable.

Wrong Content Type

The file must be served as text/plain or text/markdown. Some hosting setups will serve any unknown file as application/octet-stream, which makes some parsers ignore it. Check your headers.

Forgetting the Optional Section

If a model is running under a token budget, the Optional H2 tells it what it can skip without losing the core picture. Putting your blog or low-priority sections under Optional is good signal hygiene.

How to Validate Your llms.txt

There is no formal validator yet but the practical checklist is short:

The file is reachable at exactly /llms.txt at the site root, not in a subfolder.
It is served with content-type text/plain or text/markdown.
It is valid Markdown (the Markdown parser of your choice doesn't choke).
It starts with an H1 followed by a blockquote.
Every linked URL returns a 200 status.
The file is under 200 lines (longer is allowed but signals you are dumping rather than curating).

The Meridian15 AEO audit tool checks for a valid llms.txt at the site root as one of its 12 weighted criteria. It is the heaviest single check at 12 points because the file is so easy to ship and so few sites have one. The audit grades llms.txt across six sub-criteria (file present, reasonable size, H1, summary block, H2 sections, markdown links) so a barebones one-line file scores 3 out of 12 and a properly structured guide scores the full 12. We wrote up the full breakdown of every check and how the tiered scoring works in this companion post.

How to Ship It

Three steps. First, draft the file by hand. Do not generate it from a sitemap dump, because the value is in the curation. Second, drop the file at the root of your domain (next to robots.txt and sitemap.xml). Most static hosts (Netlify, Vercel, Cloudflare Pages) just need it in your public folder. Third, validate it with our free AEO readiness check to confirm it is reachable, well-formed, and serving the right content type.

From there, treat it as a living file. Update it whenever you publish a meaningful new page or retire an old one. Most teams that ship llms.txt and never touch it again leak value within a quarter as the file drifts out of sync with the site.

Once your llms.txt is shipped, you will want a way to tell if it is actually moving your AI citations. That is a separate motion: track branded queries across Perplexity, ChatGPT, AI Overviews, and Claude over time. We compared the six best Perplexity rank tracker tools side-by-side here, plus a free DIY method that works on 20 prompts a week with no subscription.

Why It Matters Now, Not Later

The competitive window on llms.txt is open and small. Right now, fewer than 5% of marketing sites have one. That number will be 50% within 18 months. The teams that ship in 2026 get cited disproportionately while the file is rare. The teams that wait will be shipping it as table stakes alongside everyone else, with no compounding advantage.

This is the same dynamic that played out with structured data, with HTTPS, with mobile responsive design. The early movers got outsized lift. The late movers got parity at higher cost.

If you want the broader playbook (structured data, FAQ schema, named authors, citation-worthy content patterns), read our pillar guide on AEO. If you just want to know whether your site is set up correctly, run it through our AEO audit tool. Both are free.

Frequently Asked Questions

What is llms.txt and where does it live?

llms.txt is a Markdown file you place at the root of your domain (yoursite.com/llms.txt) that gives large language models a clean, structured map of the most important content on your site. It was proposed by Jeremy Howard in late 2024 as an AI-era equivalent of robots.txt and sitemap.xml. Instead of telling crawlers what to allow, it tells reasoning models what to read and how to interpret it.

Is llms.txt an official standard?

Not yet. It is a proposed convention rather than a ratified standard, similar to how robots.txt was for many years. As of 2026, major AI engines including Anthropic, OpenAI, and Perplexity have all signalled support for parsing llms.txt files when they exist, and adoption has spread quickly across documentation sites, SaaS marketing pages, and editorial publishers.

How is llms.txt different from robots.txt and sitemap.xml?

robots.txt is a permissions file. It tells crawlers what they can and cannot access. sitemap.xml is a discovery file. It lists every URL on your site so crawlers can find them. llms.txt is a curation file. It tells reasoning models which pages matter, in what order, and provides short context so the model can decide what to cite without crawling the full site. The three are complementary, not substitutes.

What should I put in my llms.txt file?

Start with an H1 of your site or product name, a short blockquote describing what the site is, and then group your most important pages under H2 sections like Docs, Pricing, Case Studies, About, or FAQ. Each link gets a one-line description. Keep it under 200 lines if possible. Resist the urge to dump every URL. The point is curation, not coverage.

Will llms.txt actually get me cited in AI answers?

On its own, no. llms.txt is one signal in a stack that also includes JSON-LD schema, FAQ markup, named authors, content depth, and citation-worthy formatting. But it is one of the cheapest signals to ship and one of the highest-leverage ones because almost no competitor has it yet. Sites that combine llms.txt with structured data and substantive content are showing meaningful lift in AI citations across ChatGPT, Perplexity, and Claude.

How do I validate my llms.txt file?

There is no W3C-style validator yet, but the practical checks are: the file must be valid Markdown, it must be served as text/plain or text/markdown, all linked URLs must resolve to 200 status, and the structure should follow the H1 plus blockquote plus H2 sections pattern. The Meridian15 AEO audit tool checks for the presence of llms.txt at the site root as one of its 12 weighted criteria.

llms.txt Explained: The File AI Engines Read Before Citing Your Site