Category: Generative Engine Optimization Course

  • LLMs.txt – What You Need to Know: The Largest Audit to Date from Adobe AEM

    Published: June 2026 · longato.ch Companion piece: this article updates and extends my earlier write-up, llms.txt: my recommendation, August 2025.


    The five findings you can quote

    “Create llms.txt because it is cheap and Google is now looking at it, not because it will get you cited in ChatGPT today.”

    “Across 22,494 recorded requests to /llms.txt over a 30-day window, agents that are verifiably large language models accounted for 258 hits, which is 1.1% of all traffic to the file.”

    “The single biggest change since my August 2025 audit is Googlebot. It is now the largest named crawler hitting /llms.txt, with 1,219 recorded requests.”

    “92.2% of all /llms.txt traffic came from agents that are neither mainstream search engines nor verifiable LLMs. The file’s main audience today is SEO tooling, monitoring services, and AI-readiness auditors inspecting the file, not models consuming it.”

    “OpenAI’s user-facing and search agents, OAI-SearchBot and ChatGPT-User, generated 209 hits across roughly 69 hosts. That is the totality of OpenAI’s interest in /llms.txt in this dataset.”

    “In a direct referrer analysis I found zero requests anywhere in the logs, search bots included, that carried /llms.txt as their referrer. Whatever crawlers do after reading the file, they do not arrive at other URLs from it in any way the logs can see.”

    What changed since August 2025

    My August 2025 analysis examined the same question on the same kind of footprint. The qualitative shift over the intervening period is best shown side by side.

    August 2025 against June 2026

    DimensionAugust 2025 (prior analysis)June 2026 (this audit)Direction of change
    Googlebot hitting /llms.txtNot a meaningful presence1,219 hits, the largest named crawler at the fileMajor increase
    Verifiable LLM hits to /llms.txtNegligible258 hits, 1.1% of all trafficStill negligible as a share
    OpenAI-specific interestMinimal209 hits from OAI-SearchBot and ChatGPT-User, about 69 hostsSlightly up, still tiny
    Dominant traffic sourceAlready non-LLMOther / unverified tooling at 92.2%The bucket has grown and professionalised
    Self-labelled audit and readiness botsEmerging60.1% of all trafficNew, large category
    Referrals originating from llms.txtNone observedStill none observedUnchanged
    Crawler entry pointHomepage-ledHomepage-ledUnchanged

    Sources: my prior published analysis from August 2025 for the “before” column, and Datasets C and D plus the referrer analysis for the “after” column.

    The most material change is Googlebot’s arrival at /llms.txt in volume. This is consistent with a wider observation in the SEO community. Martina Raissle has noted publicly on LinkedIn that Google has begun including llms.txt in its Lighthouse checks, which is itself a signal that the file is at least on Google’s radar.

    I want to be careful about what this does and does not prove. Googlebot fetching a URL is not proof that the content is used for ranking, AI Overviews, or AI Mode. A fetch is a fetch. But it is a clear change from a year ago, and combined with the Lighthouse inclusion, it is the first concrete sign from a major provider that llms.txt is being looked at rather than ignored. I weight this as worth acting on cheaply, not as proven to work, and my recommendation below reflects that.


    My recommendation

    This is my professional judgement, grounded in the data above.

    Recommendation summary

    #RecommendationSupporting evidenceConfidence
    1Create the llms.txt fileGooglebot is now the largest named crawler at the file, 1,219 hits; Google has added it to Lighthouse checksModerate
    2Treat it as low-effort insurance, not a growth leverGenerating the file is cheap; the return is asymmetric if providers do begin to use itHigh, on the cost logic
    3Do not expect it to move LLM brand visibility or citations todayVerifiable LLMs account for 1.1% of hits; no referrer trail existsHigh
    4Keep investing in homepage strength and internal linkingCrawlers enter via the homepage and follow linksHigh
    5Watch Google AI Mode and AI Overviews specificallyGoogle’s fetching plus Lighthouse inclusion is the only mover in a year; impact there is plausible but unprovenLow, speculative

    In plain terms: create the file, because Google is now hitting it, and that alone changes the calculus from a year ago. The effort is minimal, so the return on investment is favourable if the providers do in fact consume it; you are buying a cheap option on an uncertain upside. Will it move LLM brand visibility or citations? Probably not, not yet. The traditional consumer LLMs such as ChatGPT are not meaningfully using the file on this evidence, and the honest answer is that the consumption simply is not there at the scale that would move citations. Will it affect Google’s AI Mode? Maybe. Google is the one provider showing changed behaviour. I would not bet the strategy on it, but I would not ignore it either.


    What llms.txt is?

    llms.txt is a proposed Markdown file placed at the root of a domain, for example https://example.com/llms.txt. The llmstxt.org proposal frames it as a curated, machine-readable map: a short summary of the site plus a hand-picked list of the most important pages, often with companion .md versions of those pages, so that a large language model can find and ingest the high-value content without crawling the entire site or fighting through navigation, scripts, and boilerplate. The analogy its proponents draw is to robots.txt and sitemap.xml: a small, conventional file at a predictable path that machines can rely on. The crucial difference is that robots.txt and sitemap.xml are honoured by documented, identifiable crawlers, whereas llms.txt only delivers value if the LLM providers choose to read it. Whether they do is precisely the question this audit set out to answer with logs rather than opinion.


    Why I ran this LLMs.txt audit

    Two pressures converged.

    The first was a recurring question from customers. I was being asked, on a roughly weekly cadence, whether llms.txt was actually being used, and whether it was worth the effort of generating and maintaining. That is a fair question, and it deserves a data-backed answer rather than a shrug.

    The second was the state of the GEO and AEO conversation. The generative-engine-optimisation and answer-engine-optimisation community has been circulating a lot of confident, contradictory, and frequently unsourced claims about llms.txt: that the major models definitely read it, that it definitely boosts citations, or conversely that it is completely ignored. Both extremes tend to be asserted without server logs to back them. The only responsible move was to look at what bots actually do at the file, at scale.

    This is, to my knowledge, the largest single llms.txt server-log and crawl audit conducted to date by number of distinct domains and by volume of bot traffic examined. The domains analysed are real customer sites hosted on Adobe Experience Manager, and they include some of the world’s largest websites, which is what makes the bot behaviour observed here representative rather than anecdotal.

    “Most public claims about llms.txt are made without real analysis. This audit is my attempt to replace assertion with measurement, at the largest domain scale I am aware of.”


    Methodology, scope, and caveats

    Here is the setup in full so that the findings can be challenged or replicated.

    Working with a server log file analysis tool, plus a large-scale crawl of /llms.txt paths, I assembled four datasets:

    DatasetPurposeRowsKey fields
    A, domain scope logWhich hosts received bot traffic, and how many distinct bots and agents each saw6,122 hostsorigin_host, hits, distinct_bots, distinct_agents, first_seen, last_seen
    B, llms.txt existence crawlWhether /llms.txt actually resolves on each host, and what it returns5,553 crawl rows (4,819 distinct URLs, 4,685 distinct hosts)Address, Status Code, Content Type, Word Count, Size (Bytes), Crawl Timestamp
    C, llms.txt hits by host and agentEvery recorded request to /llms.txt, split by host and full user-agent string6,749 rowsHost, request_user_agent, hits
    D, llms.txt hits by agent typeThe same hit volume, pre-classified by agent family237 rowsUser Agent Type, User Agent Name, Full User Agent, Hits

    The hit data in Datasets C and D covers a 30-day window. The crawl in Dataset B carries crawl timestamps dated 29 May 2026.

    The four questions I set out to answer were:

    1. How many domains have a live llms.txt file?
    2. When an LLM reads llms.txt, does it then crawl the .md files it lists?
    3. How are LLMs actually finding the pages they crawl?
    4. Are there any referrals coming from llms.txt?

    A few caveats, stated openly:

    User-agent strings are self-declared. Any bot can claim to be anything. I classify “verifiable LLM” conservatively, counting only agents that match the documented user agents of known model providers such as OpenAI, Anthropic, Perplexity, and You.com. Hits in the “Other / unverified” bucket may include real AI activity behind generic strings, but I will not count what I cannot verify.

    Datasets C and D contain no per-event timestamp column. The 30-day window is the query window the data was extracted under; it is not re-derivable from inside the files.

    Dataset A’s first_seen and last_seen values span a short capture interval, about five minutes on 28 May 2026, which tells me these are sampling markers from one extract rather than the full 30-day span. I therefore use Dataset A only for structural facts such as host counts and bot diversity per host, and never to infer time-based volume.

    The tables below are summary tables. I am not releasing the raw logs. The figures are reproducible in principle by anyone running the same crawl and the same log query.


    How many domains actually have an llms.txt file?

    This is where precision matters most, because “has an llms.txt” is not a single thing. A request to /llms.txt can return a real Markdown file, a redirect, a 404, a soft-200 HTML page, or an empty 200. I broke Dataset B down by HTTP status.

    HTTP status of /llms.txt across 4,685 distinct hosts

    Status codeMeaningCrawl rowsShare of rows
    404Not found (no file)4,27076.9%
    301Permanent redirect60610.9%
    200OK (file served)1753.2%
    403Forbidden1743.1%
    302Temporary redirect1492.7%
    0No response or connection failure901.6%
    401Unauthorised470.8%
    406Not acceptable280.5%
    (blank)No status captured120.2%
    410Gone1under 0.1%
    307Temporary redirect1under 0.1%
    Total5,553100%

    Source: Dataset B, Status Code column. The row count includes 734 duplicate URLs, which I deduplicated before counting hosts.

    A 200 response is necessary but not sufficient to call something a real llms.txt. Many 200s are HTML catch-all pages or empty bodies. So I tightened the definition in two further steps.

    How many of the 200 responses are genuinely an llms.txt file?

    Definition (progressively stricter)Distinct hostsShare of 4,685 probedShare of 6,122 scope-file hosts
    Any HTTP 200 at /llms.txt1372.92%2.24%
    200 and Content-Type: text/plain1112.37%1.81%
    200 and word count above zero200.43%0.33%

    Source: Dataset B, Status Code plus Content Type plus Word Count columns.

    Depending on how strictly you define “has a working llms.txt“, the answer ranges from 137 hosts for any 200, down to 111 hosts for files served as plain text, and as low as 20 hosts for plain-text files with actual measurable content. The 23 responses that returned a 200 with an HTML content type are almost certainly not real llms.txt files at all.

    “Of 4,685 domains probed, only 137 returned a 200 at /llms.txt. Tighten the definition to plain text with real content and the number collapses to 20. Adoption is not just low, much of the apparent adoption is hollow.”

    Data-quality notes for the existence crawl

    IssueDetailHow I handled it
    Duplicate URLs5,553 rows but 4,819 distinct addresses, so 734 duplicate rowsDeduplicated to distinct hosts before counting
    Soft-200 HTML23 of 175 200-responses were text/html, not a text fileExcluded from the strict definitions
    Empty 200s155 of 175 200-responses had a word count of zeroReported separately and flagged as likely empty or placeholder
    Word-count range on real filesThe 20 non-empty files ran from 2 to 69 wordsReported; even the “real” files are extremely short

    A word count between 2 and 69 on the files that do have content tells me most of these are minimal stubs, a title and a couple of links, rather than the rich, curated index the llmstxt.org proposal envisions. Adoption is shallow on both axes: few sites have the file, and few of those have populated it meaningfully.


    Do LLMs crawl the .md files, and are there any referrals from llms.txt?

    These two questions share one answer, and it comes from a direct analysis of the referrer field in the logs.

    I did not find a single request anywhere in the server logs whose referrer was a /llms.txt URL. This held across all bot types, search engines included, not only LLM agents.

    There are two possible explanations, and the logs alone cannot distinguish between them. Either the bots do not crawl immediately: they may read llms.txt, archive or queue what they find, and crawl later in a separate session that carries no referrer linking back to the file. Or the referrer is simply not preserved: bots may crawl the listed .md files but not populate the referrer header with the llms.txt URL.

    Either way, the practical consequence is the same. There is no observable evidence in the logs that llms.txt is functioning as a crawl-routing hub. If llms.txt were doing the job its proposal describes, feeding models a list of URLs that they then fetch, I would expect to see at least some referrer trail. I see none.


    How are LLMs actually finding pages to crawl?

    From the same referrer analysis: when bot requests did carry a referrer, it was, in the overwhelming majority of cases, the homepage of the domain.

    The behavioural picture is that crawlers, including AI crawlers, predominantly enter a site at the homepage and discover the rest of the site by following links from there, exactly as classical web crawlers always have. They are not, on this evidence, entering via llms.txt and fanning out from its curated list. The homepage and its internal linking remain the primary discovery surface. This is a strong argument that the fundamentals of crawlability and internal linking still matter far more than a curated llms.txt for getting your content seen.

    “On the referrer evidence, AI crawlers behave like classical crawlers. They enter at the homepage and follow links. llms.txt is not the front door.”


    Who is actually hitting llms.txt? The 22,494-hit breakdown

    This is the heart of the audit. Dataset D pre-classifies every recorded hit by agent family, and Dataset C lets me verify that classification against the raw user-agent strings. The two reconcile to the same total, 22,494 against 22,493, a one-hit difference from how the two extracts were generated.

    /llms.txt hits by agent type, 30-day window

    User-agent typeHitsShare
    Other / unverified20,74692.2%
    Search engine1,4346.4%
    LLM / AI (verifiable)2581.1%
    SEO / crawlers (declared)360.2%
    Dataset / training130.1%
    Social / preview7under 0.1%
    Total22,494100%

    Source: Dataset D, User Agent Type by Hits.

    Hits by named agent (the agents that are identifiable)

    Named agentOperator familyHits
    GooglebotSearch engine1,219
    OAI-SearchBotOpenAI153
    BaiduSpiderSearch engine127
    ChatGPT-UserOpenAI56
    AmazonbotE-commerce / AI38
    BingbotSearch engine36
    GPTBotOpenAI (training)33
    AhrefsBotSEO tool28
    ApplebotSearch / AI13
    BytespiderByteDance12
    ClaudeBotAnthropic10
    SemrushBotSEO tool6
    Facebook External HitSocial preview5
    PerplexityBotPerplexity4
    Meta ExternalAgentMeta2
    Perplexity-UserPerplexity1
    YouBotYou.com1
    CCBotCommon Crawl1

    Source: Dataset D, User Agent Name by Hits, excluding the “Unknown” aggregate of 20,746.

    The verifiable LLM/AI agents in full

    LLM/AI agentHits
    OAI-SearchBot (OpenAI)153
    ChatGPT-User (OpenAI)56
    GPTBot (OpenAI training)33
    ClaudeBot (Anthropic)10
    PerplexityBot (Perplexity)4
    Perplexity-User (Perplexity)1
    YouBot (You.com)1
    Total verifiable LLM/AI258

    Source: Dataset D, User Agent Type = LLM / AI.

    “Strip out the search engines and the unverifiable bots, and the entire verifiable-LLM interest in llms.txt, across a 30-day window on thousands of domains, amounts to 258 requests. Anthropic, Perplexity, and You.com combined: 16.”

    What is the 92% actually made of?

    The unverified bulk deserves scrutiny rather than a dismissive label. Using Dataset C’s raw user-agent strings, I found that it is dominated by a long tail of self-described tooling: site-statistics bots, monitoring bots, SEO site-audit crawlers, and a striking number of agents whose own user-agent strings advertise that they exist to audit or check llms.txt and AI-readiness.

    Composition of /llms.txt traffic by operator family (raw-string classification)

    Operator familyHitsShareDistinct hosts touched
    Other / unverified (tooling, monitors, auditors)20,77292.3%3,134
    Google1,2275.5%319
    OpenAI2421.1%69
    Baidu1270.6%36
    Amazon380.2%12
    Microsoft / Bing350.2%20
    Apple130.1%13
    ByteDance120.1%5
    Anthropic120.1%11
    Meta8under 0.1%4
    Perplexity5under 0.1%5
    You.com1under 0.1%1
    Common Crawl1under 0.1%1

    Source: Dataset C, full user-agent strings classified by operator. Minor differences from the agent-type table reflect the raw-string method counting AdsBot-Google and similar agents under their parent family.

    Two concentration facts stand out. The top ten user-agent strings alone accounted for 17,569 of 22,493 hits, which is 78.1% of all traffic to the file. And agents whose user-agent string self-labels with terms such as audit, monitor, readiness, llms.txt, crawler, GEO, or research represented 105 distinct agents and 13,508 hits, which is 60.1% of all traffic.

    “60% of all traffic to llms.txt came from agents that openly describe themselves as auditors, monitors, or readiness-checkers. The file’s biggest use case right now is being inspected to see whether it exists, a self-referential market rather than consumption by models.”

    This is the most under-reported reality of llms.txt in mid-2026. Raw hit counts on the file are rising, and it is tempting to read that as LLMs adopting it. The composition says otherwise. A large share of the traffic is the GEO ecosystem checking itself: tools verifying that a customer has the file, monitors polling for changes, readiness-scanners selling the idea that the file matters. That activity is real, but it is not evidence that any model is using the file to answer questions.


    Host-level reality check

    Beyond raw hits, I cross-referenced which hosts have a real file against which hosts received any /llms.txt traffic.

    Hosts: file presence against received traffic

    MeasureCount
    Hosts returning 200 at /llms.txt (www-normalised)130
    Hosts that received at least one /llms.txt request (www-normalised)2,649
    Hosts that both have a file and received a hit80
    Hosts that have a file but recorded no hit50
    Distinct hosts receiving any /llms.txt hit (raw)3,236

    Source: Datasets B and C, joined on www-normalised host.

    Two things stand out. First, the vast majority of /llms.txt requests land on hosts that do not even have the file: bots and tools are probing for it speculatively and hitting 404s. Second, of the hosts that do have a real file, more than a third, 50 of 130, saw no recorded hit at all in the window. Presence and attention are only loosely coupled.



    Limitations and an invitation to challenge

    Here is where this audit stops short.

    User agents are self-declared, so the 92.2% Other bucket could hide real AI activity behind generic strings. I have deliberately under-counted LLM activity rather than over-claim it. The hit datasets carry no per-event timestamps, so the 30-day window is the extraction window rather than a field I can re-derive. Fetched does not mean used: nothing in server logs can prove that any provider used llms.txt content in a model output, because logs show requests, not downstream use. This is a snapshot, a single 30-day window compared qualitatively to a prior one, not a continuous time series. And referrer behaviour is provider-dependent, so the absence of a referrer trail is strong evidence of no observable routing rather than absolute proof that no provider ever crawls from the file.

    If you can replicate, extend, or contradict any of this with your own logs, I want to hear about it. I will investigate and publish a visible correction if anything here proves wrong.


    Frequently asked questions

    How many websites actually have an llms.txt file? In this audit, of 4,685 domains probed, 137 returned a working 200 response at /llms.txt, which is about 2.9%. If you require the file to be served as plain text the number is 111, and if you require it to contain real content it drops to 20.

    What percentage of websites have llms.txt? On this AEM-hosted sample, between 0.4% and 2.9% depending on how strictly you define a working file. The headline figure of 2.9% counts any 200 response; the strict figure of 0.4% counts only plain-text files with measurable content.

    Do large language models actually read llms.txt? Rarely, on this evidence. Verifiable LLM agents accounted for 258 of 22,494 requests to the file, which is 1.1% of all traffic, over a 30-day window across thousands of domains.

    Does ChatGPT use llms.txt? OpenAI’s search and user agents, OAI-SearchBot and ChatGPT-User, made 209 requests across roughly 69 hosts. That is real but tiny, and there is no evidence in the logs that the file drives any onward crawling.

    Does Google use llms.txt? Googlebot is now the single largest named crawler hitting the file, with 1,219 requests. Google has also begun including llms.txt in Lighthouse checks. A fetch is not proof of use in ranking or AI features, but it is a clear change from a year ago.

    Does Gemini or Google AI Mode use llms.txt? I cannot confirm this from the data. What I can confirm is that Googlebot is fetching the file. Whether that content feeds AI Mode or AI Overviews is plausible but unproven on these logs.

    Does Claude use llms.txt? Anthropic’s ClaudeBot made 10 requests to the file across the entire dataset. That is negligible.

    Does Perplexity use llms.txt? Perplexity’s agents made 5 requests in total, PerplexityBot and Perplexity-User combined. That is negligible.

    Is llms.txt worth creating in 2026? My view is yes, but as cheap insurance rather than a growth lever. It costs little to create, Google is now hitting it, and the upside is asymmetric if providers begin to consume it. Do not expect it to move LLM citations today.

    Will llms.txt improve my rankings? There is no evidence in this data that it does. Crawlers enter via the homepage and follow internal links. Classical crawlability and internal linking remain far more important.

    Will llms.txt get my brand cited in AI answers? Probably not at present. The models that drive consumer AI answers are barely touching the file, and there is no observable crawl activity downstream of it.

    Do LLMs crawl the .md files listed in llms.txt? There is no evidence that they do so directly from the file. I found zero requests whose referrer was an llms.txt URL, so either crawlers do not crawl immediately after reading it, or they do not preserve the referrer.

    How do LLMs and AI crawlers find pages to crawl? Predominantly via the homepage. When requests carried a referrer it was almost always the domain homepage, indicating crawlers enter there and follow internal links, exactly as classical crawlers do.

    Should llms.txt be plain text or HTML? Plain text. In this audit, 23 of the 175 200-responses were served as HTML, and those are almost certainly catch-all pages rather than real llms.txt files. A real file should return text/plain.

    Why do so many llms.txt requests return a 404? Because most sites do not have the file. In this crawl, 76.9% of probed URLs returned a 404. Many bots and tools probe for /llms.txt speculatively and simply hit a missing file.

    What bots hit llms.txt the most? The largest single sources are unverified tooling and monitoring bots, followed by Googlebot as the largest named crawler. The top ten user-agent strings alone made up 78.1% of all traffic to the file.

    Are most llms.txt hits really from AI models? No. 92.2% of traffic came from agents that are neither mainstream search engines nor verifiable LLMs, largely SEO tools, monitors, and AI-readiness auditors. Only 1.1% came from verifiable LLMs.

    What is an llms.txt auditor bot? It is a crawler, often from a GEO or SEO tool, whose purpose is to check whether a site has an llms.txt file and report on it. In this dataset, agents that self-label as auditors, monitors, or readiness-checkers accounted for 60.1% of all traffic to the file.

    Does having an llms.txt file guarantee bots will read it? No. Of the 130 hosts with a real file, 50 recorded no hit at all in the window. Presence and attention are only loosely coupled.

    How big should an llms.txt file be? The proposal envisions a curated index, but in practice the files that had content in this audit were very short, between 2 and 69 words, suggesting most are minimal stubs. Aim for a genuinely useful, curated list of your most important pages rather than a token file.

    Is llms.txt the same as robots.txt or sitemap.xml? It is similar in concept, a small conventional file at a predictable path, but different in standing. robots.txt and sitemap.xml are honoured by documented crawlers, whereas llms.txt only delivers value if model providers choose to read it, and on this evidence most do not yet.

    Did anything change with llms.txt between 2025 and 2026? The biggest change is Google. Googlebot went from a non-presence to the largest named crawler at the file, and Google added it to Lighthouse. Everything else stayed roughly the same: verifiable LLM usage remained negligible, and no referrer trail from the file appeared.

    Is this the largest llms.txt study? To my knowledge, yes, by number of distinct domains and by volume of bot traffic examined. The data comes from real customer domains hosted on Adobe Experience Manager, including some of the world’s largest websites.

    Where does the data in this analysis come from? From server-log and crawl data across customer domains hosted on Adobe Experience Manager, analysed with a server log file analysis tool over a 30-day window, with a companion crawl of /llms.txt paths dated 29 May 2026.

    How was the data anonymised? No customer, brand, or third-party vendor names appear anywhere in this article. Every identifier has been removed and replaced with a neutral category label, and only aggregate summary figures are published.

    Can I reproduce these findings myself? Yes, in principle. Crawl /llms.txt across your domain set and record status, content type, and word count; query 30 days of server logs for requests to /llms.txt grouped by host and user-agent string; classify user agents conservatively; and separately query the referrer field for any request whose referrer is /llms.txt.

    What is the single most important takeaway? That raw hit counts on llms.txt are misleading. Most of the traffic is the GEO ecosystem checking itself, not models consuming the file. Create the file because it is cheap and Google is now looking at it, but keep your real investment in homepage strength and internal linking.

    A note on the data and on disclosure. The findings below come from server-log and crawl data across customer domains hosted on Adobe Experience Manager (AEM). I analysed this data directly using a server log file analysis tool. I work in this field, and all views expressed here are my own and do not represent those of my employer. No customer, brand, or third-party vendor names appear anywhere in this article. Every identifier has been removed and replaced with a neutral category label.


    Written by Flavio Longato and published June 2026 on longato.ch. All views my own and not those of my employer. Companion analysis: llms.txt, my recommendation, August 2025. Spotted an error? Get in touch via longato.ch and I will publish a visible correction.

  • What Is LLM Crawling and Why Does It Matter?

    Large language models now crawl websites much like search engines do. But many site owners have no idea their pages are invisible to these systems. If your content cannot be read by an LLM, you lose a growing source of traffic and citations.

    I have spent years working on technical SEO, and I can tell you that the overlap between search engine optimisation and LLM readability is huge. The same foundations that help Google read your site also help ChatGPT, Perplexity, and other AI tools find and reference your content. Yet there are key differences that catch people off guard.

    How LLMs Crawl and Process Web Content

    LLM crawling follows a familiar pattern. A bot visits your site, fetches your pages, and reads the content. In traditional SEO, we talk about crawling, indexing, and ranking. With LLMs, the steps are crawling, tokenisation, and rendering. The bot arrives, collects the text, breaks it into tokens, and stores it for later use in responses.

    If a page cannot be crawled or read, no AI system will use it as a source. That means no citations, no referrals, and no visibility in AI-generated answers. This is a real problem for businesses that rely on organic discovery. According to Google’s crawler documentation, the basic principles of making content accessible to bots have not changed much. But LLMs add a few new wrinkles.

    Common Technical Blockers

    Several technical issues stop LLMs from seeing your content. The most common one is robots.txt. When LLMs first appeared around 2023 and 2024, many website owners blocked AI crawlers out of fear. They worried that models would absorb their content without giving credit. Now it is 2026, and that stance is counterproductive. More people use LLMs every day. Blocking these bots means you opt out of a real traffic channel.

    Another blocker that surprised many site owners was CDN default settings. Cloudflare, for example, started blocking LLM bots by default for new customers in late 2025. If you use a CDN, check your bot management settings. You might be blocking AI crawlers without knowing it. In your server logs or monitoring tools, this shows up as a string of 403 or 404 errors for known LLM user agents.

    Other blockers include:

    • Inconsistent canonical tags that waste crawl budget
    • URL parameters creating duplicate pages
    • Content behind logins or paywalls
    • Heavy interstitials that block the page content

    These are familiar problems in SEO. But with LLMs, the tolerance is even lower. A search engine might still manage to parse a messy page. An LLM bot often will not bother. As Search Engine Journal explains, crawl budget matters for every type of bot, not just Googlebot.

    Why JavaScript Rendering Is the Biggest Problem

    Here is my contrarian take: the single biggest barrier to LLM visibility is not robots.txt or CDN settings. It is client-side JavaScript rendering. Most people in the SEO world stopped worrying about JavaScript a couple of years ago because Google got very good at rendering it. That gave everyone a false sense of security.

    LLMs do not render JavaScript the way Google does. When an LLM bot visits a page, it typically reads the raw HTML without executing scripts. If your content loads through React, Angular, Vue, or any other client-side framework, the bot may see an empty shell. I have personally audited sites where only 70 to 75 percent of the page content was visible to LLM crawlers. That is a huge chunk of missing information.

    From my own experience building and managing websites early in my career, I know how painful it is to fix rendering issues at the infrastructure level. You need developer resources, time, and tickets that sit in a backlog for months. Server-side rendering or static site generation is the proper fix, but it is slow to implement. Edge rendering solutions offer a faster workaround. They pre-render your pages and serve the full HTML to LLM bots, pushing visibility from partial to complete.

    How to Check Your LLM Visibility

    You should not guess whether LLMs can see your content. Test it. One practical method is to compare the word count of a fully rendered page (what a human browser sees) against what an LLM bot receives (the raw HTML response). A large gap means you have a rendering problem.

    Browser extensions and specialised tools can automate this comparison. They highlight exactly which sections of your page are invisible to AI crawlers. This gives you hard data to bring to your development team. Instead of saying “we think there is a problem,” you can say “42 percent of our product page content is hidden from LLM bots, and here is the proof.”

    You should also review your robots.txt file and check for any directives that block known LLM user agents like GPTBot, ClaudeBot, or PerplexityBot. A quick audit of your CDN settings is equally important.

    Looking Ahead

    LLM crawling is not a passing trend. It is becoming a standard part of how people find information online. The sites that treat LLM readability as a first-class concern today will have a clear advantage as AI-driven search grows. Those that ignore it will watch their content disappear from an increasingly important channel.

    The good news is that most fixes are straightforward. Unblock your robots.txt, check your CDN, and address JavaScript rendering gaps. These are not exotic tasks. They are the same kind of technical hygiene that good SEO has always demanded. The difference now is that the audience includes machines that summarise, cite, and recommend your content to millions of users.

  • What Is the Difference Between AI Mentions and Citations?

    If you have been paying attention to how AI tools like ChatGPT or Google Gemini respond to user queries, you have probably noticed that some brands appear in the text while others get a clickable link at the bottom. These are two very different things. One is a mention. The other is a citation. And the distinction matters more than most marketers realise.

    I have spent the past year studying how large language models reference brands and websites in their outputs. What I have found is that many SEO professionals conflate mentions and citations, treating them as interchangeable. They are not. Understanding the gap between the two is essential if you want your brand to show up properly in AI-generated answers.

    What Counts as a Mention in AI Answers

    A mention happens when an AI model includes your brand name or product name in the body of its response. For example, if you ask ChatGPT “how to edit PDFs,” it might write something like “Adobe Acrobat is a popular tool for editing PDF files.” That is a mention. Adobe and Acrobat appear in the text, but there is no link pointing back to Adobe’s website.

    Mentions come from the model’s training data. The AI has processed billions of web pages and learned associations between brands and topics. It knows that Adobe is connected to PDF editing because that relationship appeared thousands of times across the data it was trained on. The model is not fetching this information live from the web. It is recalling patterns from its training.

    This is an important point. A mention does not mean the AI visited your website or verified your content. It simply means your brand is strongly associated with a given topic in the model’s learned knowledge. You could have zero indexable pages and still get mentioned if your brand is well-known enough.

    How Citations Work Differently

    A citation is something else entirely. It occurs when the AI links to your page as a supporting source for its answer. This typically happens through retrieval-augmented generation (RAG), where the model actively searches the web or a defined index to pull in fresh information before composing its response.

    When a system like Bing Chat or Google’s AI Overview performs a live search, it retrieves web pages, extracts relevant information, and then weaves that into its answer. The pages it pulled from get listed as citations, usually with clickable links. This is a much stronger signal than a mention because it means the AI treated your content as evidence.

    Think of it this way. A mention says “this brand exists and is relevant.” A citation says “this specific page helped me answer the question.” The difference in value is significant for anyone thinking about generative engine optimisation.

    Why Most Marketers Get This Wrong

    Here is my contrarian take. Most of the current discourse around “AI SEO” focuses too heavily on getting mentioned. People celebrate when ChatGPT name-drops their brand. But a mention without a citation is a bit like being talked about at a party without anyone knowing your address. It builds awareness, sure. But it does not drive traffic or prove authority in the way a citation does.

    I have seen brands with strong mentions but almost no citations. Their names appear in AI answers, yet the models never link back to their actual content. This usually happens when a brand is famous but its web pages are not structured well enough to be retrieved by RAG systems. The opposite also exists. Smaller, well-optimised sites earning citations despite having lower brand recognition.

    The practical lesson here is that optimising for citations requires a different approach than optimising for mentions. Mentions grow from brand awareness and PR. Citations grow from having well-structured, authoritative, and schema-marked content that RAG systems can easily retrieve and verify.

    What This Means for Your Strategy

    If you are serious about showing up in AI-generated results, you need to work on both fronts. For mentions, focus on building genuine brand authority across the web. Get covered by reputable publications. Build a strong presence on platforms that LLMs are trained on. This is long-term brand building.

    For citations, the work is more technical. Make sure your pages are crawlable, fast, and clearly structured. Use proper headings. Include factual, verifiable claims. According to Google’s own E-E-A-T framework, content that demonstrates first-hand experience and expertise is more likely to be deemed trustworthy. RAG systems appear to follow similar logic when selecting which sources to cite.

    From my own testing, pages that answer specific questions clearly and concisely tend to earn more citations than long, rambling guides. The AI is looking for evidence, not filler. Give it a clean answer it can point to.

    The brands that will win in this new era are the ones that understand both signals and treat them as complementary. Mentions build the top of funnel. Citations build the trust. Get both right, and you are well positioned regardless of how AI search evolves from here.

  • What Is AI Visibility Score and How Do You Measure It

    If you have been working on getting your brand visible inside AI-generated answers, you have probably come across the term “visibility score.” It sounds straightforward, but the reality is messier than most people expect. I have spent a fair amount of time testing different AI visibility tools, and I want to share what I have learned about what this metric actually means and which supporting numbers you should watch alongside it.

    What a Visibility Score Actually Tells You

    A visibility score is an aggregate metric. It rolls up several signals into a single number that represents how often and how prominently your brand appears across a set of AI prompts. The inputs typically include whether you were mentioned, whether a citation pointed back to your site, where in the answer your brand appeared, and the sentiment of the mention.

    The trouble is that every tool calculates it differently. There is no universal standard. LLM Optimizer, for instance, weights mentions, citations, URL presence, position (first, second, third, fourth), and sentiment into a composite figure. A brand that gets mentioned first with a positive tone and a backlink scores far higher than one that appears third with no citation and a neutral tone. Other platforms may skip sentiment entirely or weigh position differently.

    This lack of standardisation is something I think the industry needs to address quickly. If you compare your score across two different tools, you might get wildly different numbers for the same set of prompts. That makes benchmarking against competitors tricky unless everyone agrees on one platform.

    A Real-World Example of How Scores Break Down

    Let me walk through a practical case. Take the prompt “how to make the perfect espresso shot.” In LLM Optimizer, a brand tracking that prompt might see a visibility score of around 22. Why so low? Because the brand was mentioned but had no citation link. The sentiment was neutral, not negative, which helps, but the absence of a URL pointing back to the site drags the score down considerably.

    The ideal scenario would be a mention in the first position, a direct citation to your website, and positive sentiment. That combination pushes you towards 100%. In my experience, very few brands consistently hit that ceiling across a broad prompt set. The ones that do tend to have strong topical authority and structured data that AI models find easy to reference. According to research from Search Engine Land, brands that invest in entity-based SEO tend to perform better in AI-generated results precisely because large language models favour well-structured, authoritative sources.

    Why Visibility Score Alone Is Not Enough

    Here is where I hold a view that goes against the grain. Many marketers treat visibility score as the single north-star metric for AI search performance. I think that is a mistake. The score is too broad to act on directly. If your visibility score drops by ten points this week, what exactly do you fix? The number itself does not tell you.

    You need to pair it with more granular metrics. Brand mentions over time show you whether your presence is growing or shrinking. Citation tracking tells you if AI models are actually linking back to your content. Agentic traffic and referral data from tools like Google Analytics reveal whether those AI mentions translate into real visits. Without these supporting signals, you are flying blind with a single number that could move for a dozen different reasons.

    I have been doing SEO and digital marketing for over fifteen years, and every time a new “single metric” emerges, teams fixate on it at the expense of nuance. Visibility score is useful for board-level reporting, but the actual optimisation work happens when you drill into the components beneath it.

    Do Not Forget AI Features in Traditional Search

    One detail that often gets overlooked is that AI features inside traditional search results, such as Google’s AI Overviews, are frequently counted as part of your overall search performance reports. This means your visibility score and your standard SEO metrics are not entirely separate worlds. If you are tracking performance in Google Search Console, some of those impressions may already include AI-generated snippets.

    The practical takeaway is that you need to blend your AI visibility data with your existing search analytics. Looking at either in isolation gives you an incomplete picture. A high visibility score in ChatGPT or Perplexity means little if those mentions never convert into site traffic, and a dip in organic impressions might partly be explained by shifts in AI feature placement rather than a ranking penalty.

    Picking the Right Metrics for Your Situation

    If I had to recommend a starting dashboard for AI visibility, it would include four things: the aggregate visibility score for trend monitoring, citation count with URLs to see which pages AI models prefer, sentiment breakdown to catch reputation issues early, and referral traffic from AI sources to measure actual business impact.

    Start with those four and expand as your understanding deepens. The tools are evolving quickly and standardisation will come eventually. Until then, pick one platform, learn its methodology inside out, and resist the temptation to chase a perfect score. The brands that win in AI search will be the ones that understand what sits behind the number, not just the number itself.

  • How to Map Prompts to Personas for Better LLM Visibility

    Most businesses treat their audience as one big group when optimising for large language model visibility. They write a single set of prompts, test them broadly and call it a day. The trouble is, averaging your visibility across an entire audience hides the gaps where you are invisible to the people who matter most. Mapping prompts to specific personas is the fix, and it is simpler than you might think.

    Why One-Size-Fits-All Prompting Falls Short

    When I first started testing how brands appear inside AI-generated answers, I made the same mistake everyone else does. I wrote prompts from my own point of view and assumed the results spoke for the whole market. They did not. A procurement director searching for manufacturing software asks questions nothing like those a graduate engineer would type. Their vocabulary differs, their intent differs and the depth of answer they expect differs. If you only test with generic prompts, you will see a comfortable average that masks real blind spots.

    Research from the Search Engine Land guide on GEO confirms that generative engine optimisation requires thinking about user intent at a granular level. Generic content may rank, but it rarely gets cited when an LLM assembles a tailored response for a specific user need.

    What Persona-Based Prompt Mapping Actually Means

    Persona-based prompt mapping means grouping your test prompts by a real user type. Not a fictional marketing avatar with a name and a stock photo, but a practical profile built on genuine differences in intent, language and expectations. Think of categories like these:

    • Decision makers who need ROI figures and comparisons.
    • Practitioners who want step-by-step technical detail.
    • Beginners who ask broad, exploratory questions.
    • Troubleshooters who arrive with a specific problem to solve.

    Each group phrases questions differently and expects a different shape of answer. A decision maker might prompt an LLM with “best enterprise CRM for mid-market manufacturers,” while a practitioner asks “how to configure lead scoring rules in HubSpot.” Testing both tells you where your content actually performs and where it vanishes.

    How I Build Persona Prompt Clusters

    Inside LLM Optimizer, the workflow I recommend starts with listing your ideal customer profiles. For each profile, brainstorm the questions that person would realistically put to ChatGPT, Gemini or Perplexity. Group those questions into topic clusters, then run them as tracked prompts.

    Here is a contrarian take that might raise eyebrows: I believe most SEO professionals over-invest in keyword volume data and under-invest in prompt diversity. Volume tells you what people typed into Google last month. Prompt mapping tells you what people will ask an AI model tomorrow. The two data sets overlap, but they are not the same, and the gap is growing as conversational search behaviour evolves. A study published by researchers at IIT Delhi and Princeton showed that GEO tactics like authoritative language and citation inclusion boosted visibility in generative engines by up to 40 percent, but only when the content matched the query intent closely.

    Once your clusters are running, compare visibility scores across personas. You will almost certainly find that your brand shows up well for one audience segment and poorly for another. That gap is your opportunity.

    Filling the Gaps Your Data Reveals

    After identifying weak spots, the content work becomes targeted. If decision makers see your brand but beginners do not, you likely lack introductory explainer content. If troubleshooters find you but practitioners do not, your how-to guides may need more technical depth. This is where first-hand experience matters. I have spent the past two years auditing LLM outputs for clients across manufacturing, SaaS and professional services, and the pattern repeats: brands that write for a single reader profile leave entire personas on the table.

    The Google helpful content guidelines stress demonstrating experience and expertise. That principle applies just as strongly to LLM visibility. Models trained partly on web content inherit the same quality signals. If your page reads like it was written by someone who has genuinely done the work, it stands a better chance of being surfaced in an AI-generated answer.

    Where This Is Heading

    Persona-based prompt mapping is not a one-off audit. As LLMs update their training data and refine how they select sources, the prompts that matter will shift too. I run my clusters on a rolling monthly cycle so that changes surface quickly. The brands that build this habit now will have a structural advantage as AI-driven search grows. Those still relying on a single averaged visibility score will keep wondering why their traffic from generative engines stays flat.

    Start small. Pick two or three personas, write ten prompts for each and track the results for a month. The data will speak for itself, and you will never go back to treating your audience as a single block again.

  • What Is AI Brand Monitoring and Why Does It Matter

    I have spent the past year watching how large language models talk about brands, products and services. What I have found is both fascinating and slightly unsettling. AI systems do not pull from a single, frozen database. They update, they re-crawl, and they change their answers without warning. If you are not keeping an eye on what they say about you, you are flying blind.

    Why AI Answers About Your Brand Keep Shifting

    Most people assume that once an AI gives a correct answer about their company, the job is done. That is wrong. Models get retrained. The web changes daily. Even a small tweak to a user’s prompt can produce a wildly different output. I have seen cases where a brand was cited accurately on Monday and dropped entirely by Thursday. Three weeks later it reappeared. This is not a bug; it is how these systems work.

    Think of it as quality assurance for external narratives. You already monitor your Google rankings, your social mentions and your review scores. AI brand monitoring is simply the next layer. According to Gartner’s overview of generative AI, these models are reshaping how consumers discover and evaluate products. If you ignore that channel, someone else will fill the gap with information you cannot control.

    The Real Cost of Incorrect AI Responses

    Here is where my experience diverges from the usual optimism. Many marketers treat AI visibility as a nice-to-have. I would argue it is closer to a reputational risk. I have personally encountered third-party websites carrying outdated or flat-out wrong product descriptions. When an LLM picks up that misinformation and serves it to a potential customer, the damage is real. The customer might buy the wrong product, receive a service that does not match expectations, or simply lose trust in the brand.

    Returns, complaints and negative word of mouth all follow. A BrightLocal consumer survey found that the majority of consumers trust online information as much as personal recommendations. When that information comes from an AI chatbot, the stakes are even higher because users often treat it as a single authoritative source rather than one result among many.

    How Weekly Monitoring Catches Problems Early

    Daily checks are available, but from what I have seen, a weekly cadence strikes the right balance between vigilance and practicality. Tools like LLM Optimize let you track how and when your brand appears in AI-generated answers over time. You get a historical view that shows patterns rather than snapshots.

    A weekly review lets your team spot factual errors before they spread. Maybe your website is missing key product specifications. Maybe a competitor comparison on an external site is misleading. Maybe your opening hours changed six months ago and nobody updated the third-party listing. These are exactly the sorts of gaps that LLMs surface, and fixing them improves not just your AI visibility but your overall online accuracy.

    I keep a simple checklist: run the monitoring report, flag any new errors or omissions, trace each issue back to its source, and fix it there. Most weeks there is nothing urgent. But when something does slip through, catching it in seven days rather than seven months can save a significant amount of revenue and reputation.

    A Contrarian View on Chasing AI Visibility

    I should be honest about something. Not every business needs to obsess over AI brand mentions right now. If your customers are not yet using ChatGPT, Gemini or Copilot to research your type of product, pouring resources into LLM optimisation may be premature. The people selling AI monitoring tools have an obvious incentive to tell you otherwise. Start by checking whether AI-generated answers actually appear for queries relevant to your industry. If they do not, focus your energy on the channels that already drive revenue and revisit AI monitoring in six months.

    That said, for any brand operating in a space where consumers do turn to AI for recommendations, comparisons or how-to guidance, monitoring is not optional. The information gap between what you publish and what AI tells users will only widen if left unchecked. A study from the Reuters Institute Digital News Report highlights how quickly AI-driven search is changing information discovery habits, and the trend shows no sign of slowing.

    Getting Started Without Overcomplicating It

    You do not need a massive budget or a dedicated team. Pick two or three prompts that a potential customer might type into an AI chatbot about your brand. Run them yourself across ChatGPT and at least one other model. Note what comes back. Is it accurate? Is your brand mentioned at all? Are competitors positioned more favourably?

    Do this once a week for a month. You will quickly see whether the answers are stable or volatile, correct or misleading. From there you can decide whether a paid monitoring tool is worth the investment or whether manual checks are enough for your scale. The important thing is to start looking, because what AI says about your brand is already shaping how people perceive you, whether you are watching or not.

  • How to Improve What AI Says About Your Brand

    AI assistants are quickly becoming the first place people turn when researching a product or service. If what ChatGPT, Gemini or Perplexity says about your brand is wrong, outdated or vague, you are losing trust before a prospect ever visits your website. The good news is that you can shape these answers, not by flipping a secret switch, but by improving the information environment that AI models pull from.

    Why AI Answers Matter More Than You Think

    When someone asks an AI assistant about your business, the model does not make things up from thin air. It synthesises information from web pages, documentation, reviews and third-party mentions. If those sources contain conflicting details, the AI will either pick one at random or hedge with a vague summary. Neither outcome helps your business.

    I have seen this first-hand with clients who updated their product line months ago but never revised the copy on their own website. The old specs kept appearing in AI-generated summaries because the model had no reason to prefer the new information over the old. Consistency across every touchpoint is not optional; it is the foundation of accurate AI representation.

    Make Your Website the Clearest Source of Truth

    The single most effective step is to turn your own site into the most authoritative, up-to-date reference for everything about your brand. That means reviewing every page for outdated claims, conflicting prices, retired features and broken links. If your About page says one thing and your FAQ says another, an AI model has no reliable way to decide which is correct.

    Start with the basics. Make sure product descriptions, service offerings and company details match across every page. Add supporting evidence wherever you can: methodology notes, data points, case studies and structured documentation. According to Google’s structured data guidelines, well-organised markup helps crawlers understand content faster and more accurately. The same principle applies to the large language models that now index your pages.

    One thing many guides skip is crawlability. If important pages sit behind JavaScript tabs, login walls or lazy-loading scripts that block bots, AI systems simply will not see the content. Check your robots.txt and make sure the pages you care about most are fully accessible.

    Align Third-Party Sources With Your Message

    Your website alone is not enough. AI models weigh third-party mentions heavily because independent sources signal credibility. If a well-known review site describes your service differently from how you describe it yourself, the AI may favour the external version.

    Audit what others say about you. Search for your brand on major directories, review platforms and industry publications. Where the information is wrong, reach out and request corrections. Where it is simply thin, consider contributing guest posts or providing updated media kits that journalists and bloggers can reference. Tools like LM Optimizer let you inspect which citations AI models are pulling for specific prompts, so you can see exactly where the gaps are.

    Here is where I hold a contrarian view: most marketers focus on creating new content to influence AI answers. I believe the higher-return activity is fixing existing content. A single contradictory page on an authoritative domain can override ten blog posts on your own site. Correcting that one page often does more than a month of fresh publishing.

    Use Prompt-Based Auditing to Track Progress

    You would not run a paid ad campaign without checking the metrics. The same logic applies here. Regularly query AI assistants with the prompts your customers are likely to use. Note what the model says, which sources it cites, and whether the answer has improved since your last check.

    In the video above, I walk through a practical example using an espresso machine brand. The company wanted AI assistants to recommend a specific brewing time. By ensuring their own site stated the same figure that appeared on reputable coffee review sites, the AI answer converged on the correct recommendation. It was not instant, but over a few weeks the results shifted noticeably.

    Document your findings in a simple spreadsheet: prompt, AI response, cited sources, date. Over time this gives you a clear picture of which changes moved the needle and which did not. Search Engine Land’s guide on influencing AI answers offers a useful framework for structuring this kind of audit.

    What This Means Going Forward

    AI-generated answers are only going to become more prominent. As models improve and more people rely on them for purchase decisions, the brands that maintain clean, consistent and well-sourced information will have a structural advantage. Those that ignore this shift risk being misrepresented in the very conversations that drive buying decisions.

    The work is not glamorous. It is auditing old pages, emailing webmasters and updating product specs. But it is the kind of steady, evidence-based effort that compounds over time. Start with your own site, expand to third-party sources and measure the results. The brands that treat AI accuracy as an ongoing discipline, rather than a one-off project, will be the ones that earn the most accurate and favourable mentions in the months ahead.

  • What Are AI Citations and Why They Can Be Wrong

    Most people assume that when an AI assistant provides a link, it must be real. After all, the tool searched the web and found a source, so the citation should be trustworthy. The truth is far less reassuring. AI citations are a mixture of genuine references and fabricated URLs, and the difference between the two is not always obvious.

    In this article, I explain what AI citations actually are, how AI assistants decide when to fetch outside sources, and why you should verify every link before trusting it.

    How AI Assistants Choose Between Memory and Search

    An AI assistant can respond in two fundamentally different ways. The first is answering from its training data, the vast body of text it was exposed to during training. When you ask something general, such as how to edit a PDF, the model often has enough stored knowledge to produce a useful answer without looking anything up. The second approach involves a retrieval step. The model searches the web or pulls documents from an index, then writes an answer grounded in those documents.

    I have tested this myself many times. A question like “how do I edit a PDF” typically gets answered from memory. But a time-sensitive question like “what is the weather in Zurich today” forces the model to search, because its training data cannot possibly contain today’s forecast. The decision between these two paths is not random. It depends on whether the model judges the query to require fresh or external information.

    What surprises many users is that the model does not always get this judgment right. Sometimes it answers from memory when a search would have been more accurate. Other times it searches unnecessarily. This inconsistency is part of why AI-generated citations can be unreliable, and it is something most providers are still working to improve.

    What AI Citations Actually Are

    Citations in AI assistants appear as clickable links or small reference boxes alongside the generated text. In tools like ChatGPT, they often show up as numbered grey boxes that you can click to visit the source. When the assistant performs a retrieval step, these citations point to the web pages or documents it consulted. They serve a similar purpose to footnotes in academic writing: they tell you where the information supposedly came from.

    However, there is a critical distinction between citations produced after a genuine search and links that the model generates from memory. According to OpenAI’s documentation on browsing, ChatGPT uses a browsing tool to fetch real-time information. When the browsing tool is active, the citations are grounded in actual retrieved pages. When it is not, any URLs in the response come from the model’s training data, and those links may no longer exist or may never have existed at all.

    This is the core problem. The visual presentation of a citation looks identical whether the link is real or invented. There is no label that says “this one was actually retrieved” versus “this one was generated from patterns in the training data.”

    Why AI Citations Are Often Wrong

    Here is where my view parts from the optimists. I believe the citation problem in AI assistants is more serious than most people acknowledge. The term for invented references is “hallucination,” and it affects URLs just as much as it affects factual claims. A model might generate a URL that looks plausible, follows the correct domain structure, and even includes a realistic page slug, yet leads to a 404 error when you actually click it.

    I have seen this repeatedly in my own server logs. Hallucinated URLs from AI tools generate real HTTP requests that hit real websites and return 404 responses. If you run a website and notice a sudden increase in 404 errors with oddly specific paths, AI-generated links could be the cause. A study published on arXiv confirmed that large language models frequently produce non-existent references, especially when generating academic citations.

    The risk is not just inconvenience. If you are researching a medical question, a legal issue, or a financial decision, a hallucinated citation can lend false authority to bad information. The link looks credible. The surrounding text reads confidently. But the source does not exist.

    How to Verify AI Citations Before Trusting Them

    The practical solution is straightforward: open every citation in a new tab before you rely on it. If the page loads and contains the information the AI referenced, you can have some confidence in that particular claim. If you get a 404 or the page content does not match the AI’s summary, discard it.

    Beyond manual checking, look for signals that a retrieval step actually happened. In Google’s Gemini, for instance, you can sometimes see a “search” indicator that confirms the model queried the web. If no such indicator is present, treat any links with extra caution. I also recommend cross-referencing important claims with a traditional search engine. It takes an extra minute, but it can save you from citing a source that does not exist.

    Some users assume that paid tiers or newer models are immune to this problem. They are not. While retrieval-augmented generation has improved citation accuracy, no current system guarantees that every link is valid. Trusting AI citations blindly is a habit worth breaking now, before it costs you credibility.

    Where AI Citations Are Heading

    The next generation of AI tools will likely separate retrieved citations from generated ones more clearly. I expect we will see explicit labelling, confidence scores, and perhaps even automated link-checking built into the response pipeline. Some early experiments with attributed question answering at Google Research point in this direction.

    Until those safeguards arrive, the responsibility sits with the user. Every AI citation is a claim, not a fact. Treat it accordingly, verify before you share, and remember that a confident-sounding answer with a broken link is worse than no answer at all.

  • What Is Sentiment in AI Answers and Why Does It Matter

    When you ask ChatGPT or Google’s AI Overview a question, the words it chooses carry emotional weight. That emotional direction, whether positive, negative or neutral, is what we call sentiment. Most people never think about it, but sentiment in AI-generated answers quietly shapes how users feel about brands, products and even medical advice.

    What Sentiment Actually Means in AI Responses

    Sentiment is the emotional tone embedded in language. In a traditional search result, you click through to a webpage and form your own opinion from the content you read. With AI answers, the model has already done that work for you. It has synthesised sources, picked specific words and delivered a response that leans in a particular emotional direction.

    This matters because the AI’s word choices influence perception at scale. If an AI assistant describes a brand as “reliable and well-regarded,” that is a positive sentiment signal. If it says a product “has faced criticism for quality issues,” that is negative. The user did not visit any website. They simply absorbed the AI’s framing as fact.

    I have spent the past year building and refining sentiment tracking inside our LLM Optimizer tool, and the patterns we see are striking. The same brand can shift from mostly negative AI mentions to positive ones over a matter of weeks, depending on what new content the models ingest.

    Where Sentiment Analysis Gets It Wrong

    Here is my contrarian take: most off-the-shelf sentiment analysis is not good enough for AI answer monitoring. Standard NLP classifiers were trained on product reviews and social media posts. They struggle badly with the nuanced, synthesised language that large language models produce.

    We hit this problem early on. Take the query “best protein for weight loss.” The word “loss” is typically flagged as negative by basic sentiment models. But in a health and fitness context, weight loss is the desired outcome. It is entirely positive. We saw the same issue with pharmaceutical queries where terms like “drug,” “side effects” and “withdrawal” kept triggering false negatives even when the AI answer was recommending a product favourably.

    Sarcasm is another blind spot. If an AI response says something like “sure, if you enjoy waiting three weeks for delivery, this is the brand for you,” a naive classifier might score that as positive because of the word “enjoy.” According to research from Stanford’s NLP group, sarcasm detection remains one of the hardest unsolved problems in sentiment analysis, and AI-generated text adds another layer of complexity.

    Domain-specific language trips things up constantly. You need classifiers that understand industry context, not just generic positive and negative word lists.

    Why the Same Prompt Can Produce Different Sentiment

    One thing that surprises people is how inconsistent AI sentiment can be for identical queries. I have tested the same prompt on consecutive days and received answers with noticeably different tones. Sometimes the response is enthusiastic and recommending. Other times it is cautious and hedging.

    There are a few reasons for this. Large language models have a degree of randomness built into their generation process through temperature settings. Personalisation also plays a role. If the model has context about you from previous interactions, it may adjust its tone accordingly. And as models get updated with fresh training data, the underlying sentiment towards a topic can shift entirely.

    This variability is exactly why point-in-time sentiment checks are not enough. You need to track sentiment over time to see real trends rather than reacting to a single snapshot.

    How I Track Sentiment for Brands in Practice

    In our tool, we monitor sentiment across AI platforms on an ongoing basis. For each brand we track, the system logs whether individual AI responses are positive, neutral or negative. Over weeks and months, this builds into a trend line that tells a clear story.

    For one client in the nutrition space, we watched their sentiment score climb from mostly red (negative) to predominantly green (positive) over about six weeks. The shift correlated directly with a content strategy we had implemented: publishing more expert-authored articles, earning mentions on authoritative health sites and ensuring consistent brand messaging across platforms that AI models tend to reference.

    The breakdown at the prompt level is just as useful. You can see exactly which queries trigger negative sentiment and work backwards to understand why. Often it comes down to a single problematic source that the AI keeps citing, or outdated information that still lingers in the model’s training data.

    What This Means for Your Brand Going Forward

    AI answers are becoming a primary information channel for millions of users. The sentiment those answers carry about your brand is not something you can afford to ignore. Unlike traditional search where you control your own page’s messaging, AI responses are generated from a mix of sources you may not even know about.

    My recommendation is simple. Start monitoring how AI models talk about you. Look beyond just whether you are mentioned and examine the emotional tone of those mentions. Build a content strategy that feeds positive, accurate, expert-backed information into the ecosystem that these models draw from.

    Sentiment in AI answers is still a young field, and the tools for measuring it are improving rapidly. The brands that pay attention to this now will have a significant advantage as AI-generated answers become the default way people discover and evaluate products and services. The question is not whether AI sentiment matters. It is whether you are measuring it yet.

  • What Is Agentic Traffic in SEO?

    AI agents are now browsing websites on behalf of users. They click buttons, fill in forms, and pull content back to chatbots like ChatGPT and Perplexity. This new wave of automated visits is called agentic traffic, and most site owners have no idea it is happening.

    I have spent the last few months tracking how these AI agents interact with client websites. What I found surprised me. Many sites are accidentally blocking or confusing the very bots that could send them qualified visitors. In this article I will explain what agentic traffic is, why it matters, and how you can optimise for it.

    How Agentic Traffic Differs from Traditional Crawlers

    Traditional crawlers like Googlebot visit your pages, read the HTML, and index the content. They do not interact with the page. Agentic traffic is different. AI agents actively browse your site. They render JavaScript, attempt to click elements, and even try to submit forms.

    Think of it this way. Googlebot is a reader. An AI agent is a user. It behaves more like a real person would, except it makes decisions based on page structure rather than visual cues. If your site has confusing overlays, misplaced buttons, or poorly labelled form fields, the agent will struggle. And when it struggles, it leaves.

    Tools like Search Engine Land’s coverage of agentic SEO confirm that this shift is already well under way. The agents visiting your site today come from GPT, Perplexity, and a growing list of AI-powered browsers.

    Why Most Websites Fail the Agentic Test

    Here is a stat that caught my attention. When I audited around a hundred websites for agentic compatibility, roughly 95% had issues with form submissions. The AI agents could not fill in and submit forms properly. That is not a small problem if your business depends on lead generation.

    The most common issues I see are:

    • CDN or firewall rules blocking AI user agents entirely
    • Heavy client-side rendering that hides content from bots
    • Overlays and pop-ups that confuse agent navigation
    • Form fields without clear labels or accessible markup

    On one client site, the AI agents could only see 49% of the page content. The rest was locked behind JavaScript rendering that the bots could not process. Half the page was invisible. That is a huge missed opportunity, and the client had no idea until we measured it.

    Tracking and Measuring Agentic Traffic

    You cannot optimise what you do not measure. The first step is to identify which AI agents visit your site and whether they succeed. I track metrics like hit counts, success rates per agent, and the percentage of content visible to non-JavaScript renderers.

    For example, on one project I monitored 2,600 agentic hits over four weeks. ChatGPT’s agent had a 73% success rate. That means 27% of the time it failed to retrieve what it needed. By drilling into the failing URLs, I found specific pages where the CDN was blocking requests. A simple configuration change fixed it overnight.

    Google’s own documentation on crawlers and user agents is a good starting point for understanding bot identification. But keep in mind that agentic bots behave differently from traditional search crawlers, so your monitoring needs to go further.

    How to Optimise Your Site for AI Agents

    My honest take: most of the advice floating around about AI optimisation focuses on prompt engineering and content formatting. That matters, but it misses the bigger problem. If the agent cannot even access your page or interact with it, your content is irrelevant. Fix the plumbing before you worry about the words.

    Here is what I recommend based on my own testing:

    • Audit your server logs for AI user agents. Check if they get 200 responses or errors.
    • Test your pages with JavaScript disabled. If critical content disappears, you have a rendering problem.
    • Remove or defer overlays that appear before the main content loads.
    • Ensure all form fields have proper labels and that forms work without JavaScript where possible.
    • Review your CDN and WAF rules. Many default configurations block legitimate AI agents.

    The Search Engine Journal technical SEO guide covers many of these fundamentals, though their advice is geared towards traditional bots. The principles of clean structure, accessible markup, and fast server responses apply even more to agentic visitors.

    Where Agentic Traffic Is Heading

    We are still in the early days. Right now, AI agents mostly retrieve and summarise content. But the next generation of agent browsers will do much more. They will complete purchases, book appointments, and fill in applications on behalf of users. If your site is not ready for that, you will lose conversions to competitors whose sites are.

    I expect agentic traffic to become a standard metric in SEO reporting within the next year. The sites that start tracking and optimising for it now will have a clear advantage. Those that ignore it will wonder why their traffic numbers look fine but their conversions keep dropping.

    The shift from passive crawling to active browsing changes what it means to have a well-optimised website. Start measuring your agentic traffic today. Find the gaps. Fix the access issues. Your future visitors, both human and artificial, will thank you.

  • What Is GEO Measurement and Why Is It So Hard?

    Most marketers talk about Generative Engine Optimisation as though it were just SEO with a new hat. But when you try to measure GEO performance, you quickly discover a problem: the numbers shift every time you look at them. AI responses are non-deterministic, region-dependent and shaped by personalisation. That makes reliable measurement genuinely difficult.

    I have spent the past year tracking how brands appear inside AI-generated answers. In that time I have watched the same prompt return different brand mentions on Monday and Tuesday, from the same model, in the same location. If you are investing in GEO, you need to understand exactly where measurement stands today and where the gaps still sit.

    Why AI Responses Keep Changing

    Large language models are non-deterministic by design. Run the same prompt twice and you can get different wording, different sources cited and different brands mentioned. This is not a bug. It is how probabilistic text generation works. Temperature settings, model updates and cached context all influence output.

    For traditional search, you could pull rank-tracking data and trust it for a week. With GEO, a single data point tells you very little. You need repeated samples across time, regions and user states to build a picture you can act on. Google’s own documentation on AI principles acknowledges that model behaviour varies with context. That variation flows straight into your measurement challenge.

    Region and Personalisation Add Noise

    Location matters more than most people realise. A prompt about “best project management tools” will surface different brands in Spain than in the United States. The model draws on regional training data, local popularity signals and language cues. If your measurement tool does not simulate queries from the correct region, your data is misleading from the start.

    Personalisation adds another layer. When a user is logged into ChatGPT or Gemini, their conversation history and preferences shape the response. That means two users asking the same question can see completely different brand recommendations. Measuring “your” visibility in AI answers is therefore always an approximation. The best you can do is control for region, strip out personalisation where possible and sample frequently.

    Here is my contrarian take: most GEO tools overstate their accuracy because they run a handful of prompts from a single location and call it visibility data. That is not measurement. That is a screenshot. Real measurement requires hundreds of prompt variations, multiple regions and longitudinal tracking. If a vendor cannot explain their sampling methodology, treat their numbers with scepticism.

    The Brand Detection Problem

    One issue that caught me off guard early on was brand detection. It sounds simple: scan the AI response for your brand name and count mentions. But many brands share names with common words. Think “Apple” the fruit versus Apple the company, or “Teams” the Microsoft product versus teams of people.

    When I first tested detection scripts against real AI outputs, false positives were everywhere. A response about workplace collaboration would mention “teams” five times without once referring to Microsoft Teams. You need entity disambiguation, not string matching. Tools like those reviewed by Search Engine Journal are starting to address this, but it remains a weak spot across the industry.

    The solution involves building custom detection layers that understand context. You look at surrounding words, the prompt category and the typical entities that appear together. It is slow, manual work. But without it, your visibility score is fiction.

    What You Can Measure Today

    Despite these challenges, useful measurement is possible. Here is what works reasonably well right now:

    • Brand mention frequency across a large sample of prompts related to your category.
    • Sentiment analysis of how AI models describe your brand when they do mention it.
    • Share of voice compared to competitors within specific prompt clusters.
    • Regional differences in brand visibility across key markets.

    These metrics give you directional insight. They tell you whether your GEO efforts are moving the needle. They do not yet give you the precision of a Google Search Console click report. That gap is real, and pretending otherwise helps nobody.

    Tools built specifically for GEO tracking, such as those explored in Moz’s research on AI search, are improving fast. Sampling methods are getting smarter. Regional simulation is more accurate. Brand detection is catching up. But we are still in the early innings.

    Where GEO Measurement Goes From Here

    The trajectory is clear. As more organisations invest in GEO, the demand for reliable measurement will push tooling forward. I expect three shifts over the next twelve months. First, API-level access to AI platforms will allow real-time sampling at scale. Second, standardised metrics will emerge so that brands can benchmark across tools. Third, personalisation modelling will let you estimate visibility for different audience segments, not just a generic “average user.”

    My experience working with early GEO data for clients across multiple sectors has taught me one thing above all: patience matters more than precision right now. Track trends, not snapshots. Compare quarters, not days. Build your measurement practice today so that when the tools catch up, you already have baseline data to measure against.

    GEO measurement is messy, imperfect and improving fast. The brands that accept that reality and invest anyway will be the ones with a head start when the data finally sharpens.