LLMs.txt: Why AI Crawlers Ignore It (2025 Audit)

Updated: June 2026 · A new article has been published on the subject about LLMS.txt and extends my earlier write-up, llms.txt

This analysis aims to review the usage of LLMs.txt files in the context of LLMs.

How was the analysis performed: I audited 30 days of raw CDN logs for 1,000 Adobe Experience Manager domains to see who actually requests the file. The results were, frankly, brutal.

Findings of the LLMs.txt audit:

LLM-specific bots stayed away. No GPTBot, ClaudeBot, PerplexityBot, or similar were seen at all.
Google still probes everything. Its desktop crawler accounted for 95% of all hits.
Bing is curious but inconsistent. Only seven requests—concentrated on one domain (out of one-thousand)
OpenAI’s search bot was minimal. Ten calls from OpenAIBotSearch. GPTBot itself was absent.
SEO tools inflated the logs. Tools like Semrush Mobile and SiteAudit caused many hits, unrelated to LLMs.

Rank	User-agent	Share of all llms.txt hits
1	GoogleBotDesktop	94.9%
2	OpenAIBotSearch	1.1%
3	ScanPire	0.8%
4	BingBot	0.8%
…	Eight other bots	<1% each

Why Aren’t LLMs going to the llms.txt file?

The spec is still unofficial. No LLM lab has committed to honoring it yet.
Most training uses pre-built datasets. Like Common Crawl or books, not live fetches.
Robots.txt already covers them. Major labs honor standard tokens like GPTBot and ClaudeBot.
It’s not cost-effective. Probing llms.txt on every domain wastes crawl budget.

What are my recommendations for site owners in relation to llms.txt

This really depends on the difficulty of implementing the llms.txt file, if you feel that it would be relatively easy to create the file then go for it. If it requires a large amount of resources, then I’d recommend you hold-back until we clearly see benefits.

For example, this domain uses the llms.txt file at https://www.longato.ch/llms.txt because it was easy to implement

Use robots.txt instead. It’s the only widely respected barrier today
Watch your logs. Use tools like Grafana or BigQuery to detect AI crawlers directly
- Remember, if you use a CDN you’d need to look into the logs within the CDN

What Might Change Soon for LLMs.txt

As of now (2025 August) there are no major announcements from LLMs in relation to llms.txt

Provider	Current stance on llms.txt	Signal to watch
OpenAI	No support announced	GPTBot documentation updates
Google / Gemini	Monitors but uses Google-Extended	Revisions to Google-Extended policy
Microsoft / Copilot	Silent	BingBlog crawler updates
Meta	No mention	Meta crawler policy changes
Anthropic	No mention	ClaudeBot UA policy

Are there any external validation of my findings?

Date	Key development	Who said / did it	Take‑away
17 Jun 2025	“FWIW no AI system currently uses llms.txt.”	John Mueller, Google, on Bluesky	Google confirms zero support and no immediate plans. (Search Engine Roundtable)
19 Jun 2025	ScaleMath publishes an adoption‑tracker deep‑dive.	Independent analysts	Finds early enthusiasm among dev‑doc sites but no proof of LLM consumption. (ScaleMath)
02 Jul 2025	PPC Land headline – “llms.txt adoption stalls as major AI platforms ignore proposed standard”.	Industry press	OpenAI, Google, Anthropic still not honoring the file. (PPC Land)
22 Jul 2025	Mueller advises adding `X‑Robots‑Tag: noindex` to llms.txt to avoid clutter in Google results.	Google	Tactical hygiene tip; doesn’t affect crawling behaviour. (Stan Ventures)
24 Jul 2025	Logs show OpenAI’s crawler fetching llms.txt every ~15 min on some sites. Google’s Gary Illyes repeats “we won’t support it.”	Search Engine Roundtable	Anecdotal evidence OpenAI is testing discovery, not an official endorsement. (Search Engine Roundtable)
Late Jul 2025	Server‑log studies detect sporadic hits from other AI bots but no sustained utilisation.	ArcherEdu analytics	Suggests experiments, not production use. (archeredu.com)

Where to Go from Here

Automate deployment of llms.txt across all properties using your CMS or server configuration.
Audit quarterly. LLM behavior evolves fast—track what’s changed.

Bottom line: llms.txt is a good idea in theory, but today’s bots don’t read it. Until adoption improves, your best defense remains robots.txt and a clear content policy backed by logs.

FAQ: Understanding llms.txt

What is llms.txt and who proposed it?

llms.txt is a proposed text file format that website owners can place at the root of their domain https://example.com/llms.txt. The goal is to help LLMs to improve discovery and indexation.

Large language models increasingly rely on website information, but face a critical limitation: context windows are too small to handle most websites in their entirety. Converting complex HTML pages with navigation, ads, and JavaScript into LLM-friendly plain text is both difficult and imprecise.
Source: https://llmstxt.org/

In addition to this, MD files are used to create raw text versions of pages which allows llm bots to faster crawl and read the content. This is especially important for JS heavy / client side sites.

Why are they wrong?

While well-meaning, this recommendation overestimates its real-world effect. As shown in our log analysis, none of the major LLM crawlers (OpenAI’s GPTBot, Anthropic’s ClaudeBot, PerplexityBot, Meta’s crawler, etc.) currently request the llms.txt file. Only traditional SEO crawlers like GoogleBot or BingBot made any contact—and not for training purposes.

So while it may feel proactive, adding llms.txt today does almost nothing.

Continue the conversation:

Reddit -https://www.reddit.com/r/SEO/comments/1moss0s/llmstxt_why_almost_every_ai_crawler_ignores_it_as/

LLMs.txt: Why AI Crawlers Ignore It (2025 Audit)

Author:

Findings of the LLMs.txt audit:

Why Aren’t LLMs going to the llms.txt file?

What are my recommendations for site owners in relation to llms.txt

What Might Change Soon for LLMs.txt

Are there any external validation of my findings?

Where to Go from Here

FAQ: Understanding llms.txt

What is llms.txt and who proposed it?

Why are they wrong?

Continue the conversation:

More posts

ChatGPT Referral Traffic Increased ~60% Per Site: What I Found Across Three Analytics Sources

LLMs.txt – What You Need to Know: The Largest Audit to Date from Adobe AEM

How to Write GEO Prompts for Reliable LLM Insights

How Do LLMs Choose Citations? The Selection Process