Agentic Presence and llms.txt - Should You Add It to Your Website?

Every few months a new file promises to be the magic switch that makes your website "AI-ready." Right now that file is llms.txt, and a wave of tools, plugins, and consultants will happily sell you on the idea that publishing one will get you cited by ChatGPT, surfaced by Claude, and ranked in AI search.

It's worth slowing down, because most of what's claimed about llms.txt is wrong - not maliciously, just optimistically. The file is real and reasonable. The mythology around it is not.

What llms.txt actually is

llms.txt is a proposed convention - a clean Markdown map of your key pages, written for machines instead of humans. The idea is genuinely sensible: a normal web page is full of navigation, scripts, ads, and markup that are useful to people but pure noise to a model trying to read your content. An llms.txt file gives a curated, token-efficient index of what matters.

Great idea. But here are three things people get wrong.

1. It is NOT an accepted standard

llms.txt is a community proposal, not a standard. No crawler is required to look for it, which means it won't necessarily be among the sources used to compile an answer from AI agents.

2. If you publish one, keep it out of your sitemap - and out of the index

If you publish an llms.txt, it shouldn't be indexed, and it shouldn't be in your sitemap.xml. The whole point is on-demand consumption by an agent, not ranking in search.

Your sitemap exists to tell search engines which canonical pages matter for ranking. llms.txt isn't one of those pages. So /llms.txt should be kept as a known path an agent has to actively go and fetch. Nothing pulls it in automatically.

3. The platforms agree

If you want this straight from a primary source, Google addressed it directly. Google's own AI optimization guide lists llms.txt under "what you don't need to do," and says that even if it crawls the file, it isn't treated in any special way. No privileged handling, no ranking boost.

It's the same story on the agent side: Claude won't fetch your llms.txt unless someone explicitly points it there. There's no background process quietly ingesting /llms.txt across the web.

So when does it actually get used?

It does get used - just not the way the hype implies. The trigger is always the same: a developer or an agent's config explicitly points a coding tool at it.

Cursor (via @Docs), Claude Code, GitHub Copilot, and Windsurf can pull a library's llms.txt into context to give accurate, version-correct answers instead of hallucinating APIs. That's why Anthropic, Stripe, and OpenAI all publish one - it's cheap to maintain, and it's a real courtesy to any agent that is sent there. The pattern is always the same: opt-in, on demand, developer-facing.

What if you embed "fetch my llms.txt" in your HTML?

Reasonable question - so I tested it myself. I put an instruction directly in a page's HTML body telling models to go fetch the llms.txt file. Here's what happens.

At the crawl-and-index stage, systems ingest the HTML as content - they don't interpret the text as commands to act on. So a "fetch instructions at /llms.txt" buried in your markup is just words on a page, not a command anything executes.

That instruction can only be addressed at query time, when the agent decides whether it needs external data to answer the user's question. And here's the catch: if the agent is running on a free tier (~82% of users), it will tend to keep resource use low and avoid the extra fetch. Which means your llms.txt won't be part of the common agentic user flow at all.

The trap

The trap is the tools promising to transform your "agentic presence" by generating an llms.txt. That transformation doesn't exist. The file changes essentially nothing on its own.

So, should you add it?

Publish one if it's trivial for you - especially if you maintain a documentation site that coding agents might be pointed at. Just don't expect it to move the agentic needle by itself.

Want to influence how AI represents you? The same fundamentals apply as always: genuinely useful content and a clean, crawlable site.