LLMs.txt: Why AI Crawlers Ignore It (2025 Audit)

Updated: June 2026 · A new article has been published on the subject about LLMS.txt and extends my earlier write-up, llms.txt

This analysis aims to review the usage of LLMs.txt files in the context of LLMs.

How was the analysis performed: I audited 30 days of raw CDN logs for 1,000 Adobe Experience Manager domains to see who actually requests the file. The results were, frankly, brutal.

Findings of the LLMs.txt audit:

  • LLM-specific bots stayed away. No GPTBot, ClaudeBot, PerplexityBot, or similar were seen at all.
  • Google still probes everything. Its desktop crawler accounted for 95% of all hits.
  • Bing is curious but inconsistent. Only seven requests—concentrated on one domain (out of one-thousand)
  • OpenAI’s search bot was minimal. Ten calls from OpenAIBotSearch. GPTBot itself was absent.
  • SEO tools inflated the logs. Tools like Semrush Mobile and SiteAudit caused many hits, unrelated to LLMs.
RankUser-agentShare of all llms.txt hits
1GoogleBotDesktop94.9%
2OpenAIBotSearch1.1%
3ScanPire0.8%
4BingBot0.8%
Eight other bots<1% each

Why Aren’t LLMs going to the llms.txt file?

  1. The spec is still unofficial. No LLM lab has committed to honoring it yet.
  2. Most training uses pre-built datasets. Like Common Crawl or books, not live fetches.
  3. Robots.txt already covers them. Major labs honor standard tokens like GPTBot and ClaudeBot.
  4. It’s not cost-effective. Probing llms.txt on every domain wastes crawl budget.

What are my recommendations for site owners in relation to llms.txt

This really depends on the difficulty of implementing the llms.txt file, if you feel that it would be relatively easy to create the file then go for it. If it requires a large amount of resources, then I’d recommend you hold-back until we clearly see benefits.

For example, this domain uses the llms.txt file at https://www.longato.ch/llms.txt because it was easy to implement

  • Use robots.txt instead. It’s the only widely respected barrier today
  • Watch your logs. Use tools like Grafana or BigQuery to detect AI crawlers directly
    • Remember, if you use a CDN you’d need to look into the logs within the CDN

What Might Change Soon for LLMs.txt

As of now (2025 August) there are no major announcements from LLMs in relation to llms.txt

ProviderCurrent stance on llms.txtSignal to watch
OpenAINo support announcedGPTBot documentation updates
Google / GeminiMonitors but uses Google-ExtendedRevisions to Google-Extended policy
Microsoft / CopilotSilentBingBlog crawler updates
MetaNo mentionMeta crawler policy changes
AnthropicNo mentionClaudeBot UA policy

Are there any external validation of my findings?

DateKey developmentWho said / did itTake‑away
17 Jun 2025“FWIW no AI system currently uses llms.txt.”John Mueller, Google, on BlueskyGoogle confirms zero support and no immediate plans. (Search Engine Roundtable)
19 Jun 2025ScaleMath publishes an adoption‑tracker deep‑dive.Independent analystsFinds early enthusiasm among dev‑doc sites but no proof of LLM consumption. (ScaleMath)
02 Jul 2025PPC Land headline – “llms.txt adoption stalls as major AI platforms ignore proposed standard”.Industry pressOpenAI, Google, Anthropic still not honoring the file. (PPC Land)
22 Jul 2025Mueller advises adding X‑Robots‑Tag: noindex to llms.txt to avoid clutter in Google results.GoogleTactical hygiene tip; doesn’t affect crawling behaviour. (Stan Ventures)
24 Jul 2025Logs show OpenAI’s crawler fetching llms.txt every ~15 min on some sites. Google’s Gary Illyes repeats “we won’t support it.”Search Engine RoundtableAnecdotal evidence OpenAI is testing discovery, not an official endorsement. (Search Engine Roundtable)
Late Jul 2025Server‑log studies detect sporadic hits from other AI bots but no sustained utilisation.ArcherEdu analyticsSuggests experiments, not production use. (archeredu.com)

Where to Go from Here

  • Automate deployment of llms.txt across all properties using your CMS or server configuration.
  • Audit quarterly. LLM behavior evolves fast—track what’s changed.

Bottom line: llms.txt is a good idea in theory, but today’s bots don’t read it. Until adoption improves, your best defense remains robots.txt and a clear content policy backed by logs.

FAQ: Understanding llms.txt

What is llms.txt and who proposed it?

llms.txt is a proposed text file format that website owners can place at the root of their domain https://example.com/llms.txt. The goal is to help LLMs to improve discovery and indexation.

Large language models increasingly rely on website information, but face a critical limitation: context windows are too small to handle most websites in their entirety. Converting complex HTML pages with navigation, ads, and JavaScript into LLM-friendly plain text is both difficult and imprecise.
Source: https://llmstxt.org/

In addition to this, MD files are used to create raw text versions of pages which allows llm bots to faster crawl and read the content. This is especially important for JS heavy / client side sites.

Why are they wrong?

While well-meaning, this recommendation overestimates its real-world effect. As shown in our log analysis, none of the major LLM crawlers (OpenAI’s GPTBot, Anthropic’s ClaudeBot, PerplexityBot, Meta’s crawler, etc.) currently request the llms.txt file. Only traditional SEO crawlers like GoogleBot or BingBot made any contact—and not for training purposes.

So while it may feel proactive, adding llms.txt today does almost nothing.

Continue the conversation:

More posts