Updated: June 2026 · A new article has been published on the subject about LLMS.txt and extends my earlier write-up, llms.txt
This analysis aims to review the usage of LLMs.txt files in the context of LLMs.
How was the analysis performed: I audited 30 days of raw CDN logs for 1,000 Adobe Experience Manager domains to see who actually requests the file. The results were, frankly, brutal.
Findings of the LLMs.txt audit:
- LLM-specific bots stayed away. No GPTBot, ClaudeBot, PerplexityBot, or similar were seen at all.
- Google still probes everything. Its desktop crawler accounted for 95% of all hits.
- Bing is curious but inconsistent. Only seven requests—concentrated on one domain (out of one-thousand)
- OpenAI’s search bot was minimal. Ten calls from
OpenAIBotSearch. GPTBot itself was absent. - SEO tools inflated the logs. Tools like Semrush Mobile and SiteAudit caused many hits, unrelated to LLMs.
| Rank | User-agent | Share of all llms.txt hits |
|---|---|---|
| 1 | GoogleBotDesktop | 94.9% |
| 2 | OpenAIBotSearch | 1.1% |
| 3 | ScanPire | 0.8% |
| 4 | BingBot | 0.8% |
| … | Eight other bots | <1% each |
Why Aren’t LLMs going to the llms.txt file?
- The spec is still unofficial. No LLM lab has committed to honoring it yet.
- Most training uses pre-built datasets. Like Common Crawl or books, not live fetches.
- Robots.txt already covers them. Major labs honor standard tokens like
GPTBotandClaudeBot. - It’s not cost-effective. Probing
llms.txton every domain wastes crawl budget.
What are my recommendations for site owners in relation to llms.txt
This really depends on the difficulty of implementing the llms.txt file, if you feel that it would be relatively easy to create the file then go for it. If it requires a large amount of resources, then I’d recommend you hold-back until we clearly see benefits.
For example, this domain uses the llms.txt file at https://www.longato.ch/llms.txt because it was easy to implement
- Use robots.txt instead. It’s the only widely respected barrier today
- Watch your logs. Use tools like Grafana or BigQuery to detect AI crawlers directly
- Remember, if you use a CDN you’d need to look into the logs within the CDN
What Might Change Soon for LLMs.txt
As of now (2025 August) there are no major announcements from LLMs in relation to llms.txt
| Provider | Current stance on llms.txt | Signal to watch |
|---|---|---|
| OpenAI | No support announced | GPTBot documentation updates |
| Google / Gemini | Monitors but uses Google-Extended | Revisions to Google-Extended policy |
| Microsoft / Copilot | Silent | BingBlog crawler updates |
| Meta | No mention | Meta crawler policy changes |
| Anthropic | No mention | ClaudeBot UA policy |
Are there any external validation of my findings?
| Date | Key development | Who said / did it | Take‑away |
|---|---|---|---|
| 17 Jun 2025 | “FWIW no AI system currently uses llms.txt.” | John Mueller, Google, on Bluesky | Google confirms zero support and no immediate plans. (Search Engine Roundtable) |
| 19 Jun 2025 | ScaleMath publishes an adoption‑tracker deep‑dive. | Independent analysts | Finds early enthusiasm among dev‑doc sites but no proof of LLM consumption. (ScaleMath) |
| 02 Jul 2025 | PPC Land headline – “llms.txt adoption stalls as major AI platforms ignore proposed standard”. | Industry press | OpenAI, Google, Anthropic still not honoring the file. (PPC Land) |
| 22 Jul 2025 | Mueller advises adding X‑Robots‑Tag: noindex to llms.txt to avoid clutter in Google results. | Tactical hygiene tip; doesn’t affect crawling behaviour. (Stan Ventures) | |
| 24 Jul 2025 | Logs show OpenAI’s crawler fetching llms.txt every ~15 min on some sites. Google’s Gary Illyes repeats “we won’t support it.” | Search Engine Roundtable | Anecdotal evidence OpenAI is testing discovery, not an official endorsement. (Search Engine Roundtable) |
| Late Jul 2025 | Server‑log studies detect sporadic hits from other AI bots but no sustained utilisation. | ArcherEdu analytics | Suggests experiments, not production use. (archeredu.com) |
Where to Go from Here
- Automate deployment of llms.txt across all properties using your CMS or server configuration.
- Audit quarterly. LLM behavior evolves fast—track what’s changed.
Bottom line: llms.txt is a good idea in theory, but today’s bots don’t read it. Until adoption improves, your best defense remains robots.txt and a clear content policy backed by logs.
FAQ: Understanding llms.txt
What is llms.txt and who proposed it?
llms.txt is a proposed text file format that website owners can place at the root of their domain https://example.com/llms.txt. The goal is to help LLMs to improve discovery and indexation.
Large language models increasingly rely on website information, but face a critical limitation: context windows are too small to handle most websites in their entirety. Converting complex HTML pages with navigation, ads, and JavaScript into LLM-friendly plain text is both difficult and imprecise.
Source: https://llmstxt.org/
In addition to this, MD files are used to create raw text versions of pages which allows llm bots to faster crawl and read the content. This is especially important for JS heavy / client side sites.
Why are they wrong?
While well-meaning, this recommendation overestimates its real-world effect. As shown in our log analysis, none of the major LLM crawlers (OpenAI’s GPTBot, Anthropic’s ClaudeBot, PerplexityBot, Meta’s crawler, etc.) currently request the llms.txt file. Only traditional SEO crawlers like GoogleBot or BingBot made any contact—and not for training purposes.
So while it may feel proactive, adding llms.txt today does almost nothing.
