Most marketing teams have heard of robots.txt. A lot fewer have heard of llms.txt. That gap is the entire opportunity. While teams pour budget into traditional SEO, a tiny file at the root of your domain is quietly becoming one of the highest-leverage signals for getting cited by ChatGPT, Perplexity, Claude, and Google's AI Overviews. It costs almost nothing to ship and almost no competitor has one yet.
This guide is the technical walkthrough we wish existed when we started building llms.txt files for client sites. What it is, where it came from, what to put in it, how it differs from the files you already have, and how to validate it. If you want the broader strategic context first, read our pillar guide on what AEO actually is. Otherwise, let's get into the file itself.
llms.txt is a Markdown file you place at the root of your domain. The path is always yoursite.com/llms.txt. It gives large language models a structured, hand-curated map of the most important content on your site, formatted in a way the model can parse in a single pass without crawling.
The format is opinionated. An H1 with the site or product name. A short blockquote summarizing what the site is. Then a series of H2 sections (Docs, Pricing, Case Studies, About, FAQ) that group your most important URLs, each with a one-line description. The whole file is usually 50 to 200 lines.
The point is curation, not coverage. sitemap.xml lists every page. llms.txt lists the pages that matter and tells the model what to read first. When ChatGPT or Perplexity is deciding whether to cite your site in an answer, llms.txt is the cheat sheet that tells it where to look.
The proposal came from Jeremy Howard (co-founder of fast.ai and Answer.AI) in September 2024. The motivation was simple. Reasoning models were burning tokens trying to scrape and parse arbitrary websites, and the most useful pages on most sites were buried in nav menus, JavaScript, and analytics tags. A simple, predictable file at a known location would let models skip the noise.
By early 2026, support spread fast. Anthropic, OpenAI, and Perplexity have all signalled they parse llms.txt when present. The convention is not yet a ratified standard, but neither was robots.txt for a long time, and that didn't stop it from becoming load-bearing infrastructure. Anyone in technical SEO who waited for robots.txt to be official lost a decade of competitive advantage.
llms.txt is intentionally simple Markdown. Here is a minimal example for a SaaS product:
That is the whole format. An H1, a blockquote, H2 sections, bulleted links with one-line descriptions. The optional H2 at the bottom signals to the model which sections it can skip if it is operating under a token budget.
This is where most teams get confused. The three files do different jobs and you need all three. They are not substitutes.
robots.txt tells crawlers what they can and cannot access. It is binary. Allow or disallow. It does not describe content, it does not rank pages, it just gates access. For AI crawlers specifically, you should explicitly allow GPTBot, ClaudeBot, PerplexityBot, and Google-Extended in your robots.txt if you want to be cited. Many sites have these blocked by default and never check.
sitemap.xml is the comprehensive list of every URL on your site. It exists so crawlers (search and AI alike) can find every page without relying on internal links. It tells crawlers what exists. It does not tell them what matters.
llms.txt is the editorial layer on top. It says: here is what this site is, here are the pages that actually matter, here is the order to read them in, and here is a one-line description of each so you can decide what to cite. It is the only one of the three that is written for a reasoning model rather than a crawler.
If robots.txt is the door and sitemap.xml is the table of contents, llms.txt is the executive summary.
Lead with what the product does (the blockquote), then sections for Docs, Pricing, Case Studies, Changelog, and About. Documentation is the highest-value section because it is what AI engines cite most when answering "how do I do X with Y product." If your docs are buried two clicks deep, llms.txt is how you surface them.
Lead with what the publication covers, then sections for the major topic areas (each with 3 to 5 representative articles), Author pages, and About. Author pages are critical here because they are the primary E-E-A-T signal AI engines use to weight whether to cite a piece.
Lead with what the agency does and who it serves, then sections for Services, Case Studies, About, and Locations (if relevant). Case studies are the citation-bait. They are concrete proof of work and AI engines pull from them disproportionately when answering "best agency for X" type queries.
Lead with what the store sells and the brand promise, then sections for Top Categories, Bestsellers, Brand Story, and Customer Support. Skip the long tail of product URLs. The point is to give the model the brand context, not the catalog.
The biggest mistake is dumping every URL on the site into llms.txt. The whole point is curation. If you have 4,000 pages, llms.txt should reference maybe 30 of them, organized into clear sections. The rest belong in sitemap.xml where they always belonged.
A bare list of links is half the value. The one-line descriptions are what tell the model whether to follow a link or move on. "Pricing" is useless. "Pricing: tiered SaaS pricing with annual and monthly options, free trial available" is useful.
llms.txt is part of your shipping pipeline now. Every time you delete or rename a URL, you need to update llms.txt. A 404 in llms.txt is worse than no llms.txt at all because it tells the model your editorial layer is unreliable.
The file must be served as text/plain or text/markdown. Some hosting setups will serve any unknown file as application/octet-stream, which makes some parsers ignore it. Check your headers.
If a model is running under a token budget, the Optional H2 tells it what it can skip without losing the core picture. Putting your blog or low-priority sections under Optional is good signal hygiene.
There is no formal validator yet but the practical checklist is short:
The Meridian15 AEO audit tool checks for the presence of a valid llms.txt at the site root as one of its 12 weighted criteria. It is the heaviest single check at 12 points because the file is so easy to ship and so few sites have one.
Three steps. First, draft the file by hand. Do not generate it from a sitemap dump, because the value is in the curation. Second, drop the file at the root of your domain (next to robots.txt and sitemap.xml). Most static hosts (Netlify, Vercel, Cloudflare Pages) just need it in your public folder. Third, validate it with our free AEO readiness check to confirm it is reachable, well-formed, and serving the right content type.
From there, treat it as a living file. Update it whenever you publish a meaningful new page or retire an old one. Most teams that ship llms.txt and never touch it again leak value within a quarter as the file drifts out of sync with the site.
The competitive window on llms.txt is open and small. Right now, fewer than 5% of marketing sites have one. That number will be 50% within 18 months. The teams that ship in 2026 get cited disproportionately while the file is rare. The teams that wait will be shipping it as table stakes alongside everyone else, with no compounding advantage.
This is the same dynamic that played out with structured data, with HTTPS, with mobile responsive design. The early movers got outsized lift. The late movers got parity at higher cost.
If you want the broader playbook (structured data, FAQ schema, named authors, citation-worthy content patterns), read our pillar guide on AEO. If you just want to know whether your site is set up correctly, run it through our AEO audit tool. Both are free.
llms.txt is a Markdown file you place at the root of your domain (yoursite.com/llms.txt) that gives large language models a clean, structured map of the most important content on your site. It was proposed by Jeremy Howard in late 2024 as an AI-era equivalent of robots.txt and sitemap.xml. Instead of telling crawlers what to allow, it tells reasoning models what to read and how to interpret it.
Not yet. It is a proposed convention rather than a ratified standard, similar to how robots.txt was for many years. As of 2026, major AI engines including Anthropic, OpenAI, and Perplexity have all signalled support for parsing llms.txt files when they exist, and adoption has spread quickly across documentation sites, SaaS marketing pages, and editorial publishers.
robots.txt is a permissions file. It tells crawlers what they can and cannot access. sitemap.xml is a discovery file. It lists every URL on your site so crawlers can find them. llms.txt is a curation file. It tells reasoning models which pages matter, in what order, and provides short context so the model can decide what to cite without crawling the full site. The three are complementary, not substitutes.
Start with an H1 of your site or product name, a short blockquote describing what the site is, and then group your most important pages under H2 sections like Docs, Pricing, Case Studies, About, or FAQ. Each link gets a one-line description. Keep it under 200 lines if possible. Resist the urge to dump every URL. The point is curation, not coverage.
On its own, no. llms.txt is one signal in a stack that also includes JSON-LD schema, FAQ markup, named authors, content depth, and citation-worthy formatting. But it is one of the cheapest signals to ship and one of the highest-leverage ones because almost no competitor has it yet. Sites that combine llms.txt with structured data and substantive content are showing meaningful lift in AI citations across ChatGPT, Perplexity, and Claude.
There is no W3C-style validator yet, but the practical checks are: the file must be valid Markdown, it must be served as text/plain or text/markdown, all linked URLs must resolve to 200 status, and the structure should follow the H1 plus blockquote plus H2 sections pattern. The Meridian15 AEO audit tool checks for the presence of llms.txt at the site root as one of its 12 weighted criteria.
AI Engine Optimization
Run any URL through the Meridian15 AEO audit. We score 12 weighted criteria including llms.txt, schema, robots, and content depth. See the full SEO and AEO service.
Run the AEO Audit