Category: GEO

  • LLMs.txt – What You Need to Know: The Largest Audit to Date from Adobe AEM

    Published: June 2026 · longato.ch Companion piece: this article updates and extends my earlier write-up, llms.txt: my recommendation, August 2025.


    The five findings you can quote

    “Create llms.txt because it is cheap and Google is now looking at it, not because it will get you cited in ChatGPT today.”

    “Across 22,494 recorded requests to /llms.txt over a 30-day window, agents that are verifiably large language models accounted for 258 hits, which is 1.1% of all traffic to the file.”

    “The single biggest change since my August 2025 audit is Googlebot. It is now the largest named crawler hitting /llms.txt, with 1,219 recorded requests.”

    “92.2% of all /llms.txt traffic came from agents that are neither mainstream search engines nor verifiable LLMs. The file’s main audience today is SEO tooling, monitoring services, and AI-readiness auditors inspecting the file, not models consuming it.”

    “OpenAI’s user-facing and search agents, OAI-SearchBot and ChatGPT-User, generated 209 hits across roughly 69 hosts. That is the totality of OpenAI’s interest in /llms.txt in this dataset.”

    “In a direct referrer analysis I found zero requests anywhere in the logs, search bots included, that carried /llms.txt as their referrer. Whatever crawlers do after reading the file, they do not arrive at other URLs from it in any way the logs can see.”

    What changed since August 2025

    My August 2025 analysis examined the same question on the same kind of footprint. The qualitative shift over the intervening period is best shown side by side.

    August 2025 against June 2026

    DimensionAugust 2025 (prior analysis)June 2026 (this audit)Direction of change
    Googlebot hitting /llms.txtNot a meaningful presence1,219 hits, the largest named crawler at the fileMajor increase
    Verifiable LLM hits to /llms.txtNegligible258 hits, 1.1% of all trafficStill negligible as a share
    OpenAI-specific interestMinimal209 hits from OAI-SearchBot and ChatGPT-User, about 69 hostsSlightly up, still tiny
    Dominant traffic sourceAlready non-LLMOther / unverified tooling at 92.2%The bucket has grown and professionalised
    Self-labelled audit and readiness botsEmerging60.1% of all trafficNew, large category
    Referrals originating from llms.txtNone observedStill none observedUnchanged
    Crawler entry pointHomepage-ledHomepage-ledUnchanged

    Sources: my prior published analysis from August 2025 for the “before” column, and Datasets C and D plus the referrer analysis for the “after” column.

    The most material change is Googlebot’s arrival at /llms.txt in volume. This is consistent with a wider observation in the SEO community. Martina Raissle has noted publicly on LinkedIn that Google has begun including llms.txt in its Lighthouse checks, which is itself a signal that the file is at least on Google’s radar.

    I want to be careful about what this does and does not prove. Googlebot fetching a URL is not proof that the content is used for ranking, AI Overviews, or AI Mode. A fetch is a fetch. But it is a clear change from a year ago, and combined with the Lighthouse inclusion, it is the first concrete sign from a major provider that llms.txt is being looked at rather than ignored. I weight this as worth acting on cheaply, not as proven to work, and my recommendation below reflects that.


    My recommendation

    This is my professional judgement, grounded in the data above.

    Recommendation summary

    #RecommendationSupporting evidenceConfidence
    1Create the llms.txt fileGooglebot is now the largest named crawler at the file, 1,219 hits; Google has added it to Lighthouse checksModerate
    2Treat it as low-effort insurance, not a growth leverGenerating the file is cheap; the return is asymmetric if providers do begin to use itHigh, on the cost logic
    3Do not expect it to move LLM brand visibility or citations todayVerifiable LLMs account for 1.1% of hits; no referrer trail existsHigh
    4Keep investing in homepage strength and internal linkingCrawlers enter via the homepage and follow linksHigh
    5Watch Google AI Mode and AI Overviews specificallyGoogle’s fetching plus Lighthouse inclusion is the only mover in a year; impact there is plausible but unprovenLow, speculative

    In plain terms: create the file, because Google is now hitting it, and that alone changes the calculus from a year ago. The effort is minimal, so the return on investment is favourable if the providers do in fact consume it; you are buying a cheap option on an uncertain upside. Will it move LLM brand visibility or citations? Probably not, not yet. The traditional consumer LLMs such as ChatGPT are not meaningfully using the file on this evidence, and the honest answer is that the consumption simply is not there at the scale that would move citations. Will it affect Google’s AI Mode? Maybe. Google is the one provider showing changed behaviour. I would not bet the strategy on it, but I would not ignore it either.


    What llms.txt is?

    llms.txt is a proposed Markdown file placed at the root of a domain, for example https://example.com/llms.txt. The llmstxt.org proposal frames it as a curated, machine-readable map: a short summary of the site plus a hand-picked list of the most important pages, often with companion .md versions of those pages, so that a large language model can find and ingest the high-value content without crawling the entire site or fighting through navigation, scripts, and boilerplate. The analogy its proponents draw is to robots.txt and sitemap.xml: a small, conventional file at a predictable path that machines can rely on. The crucial difference is that robots.txt and sitemap.xml are honoured by documented, identifiable crawlers, whereas llms.txt only delivers value if the LLM providers choose to read it. Whether they do is precisely the question this audit set out to answer with logs rather than opinion.


    Why I ran this LLMs.txt audit

    Two pressures converged.

    The first was a recurring question from customers. I was being asked, on a roughly weekly cadence, whether llms.txt was actually being used, and whether it was worth the effort of generating and maintaining. That is a fair question, and it deserves a data-backed answer rather than a shrug.

    The second was the state of the GEO and AEO conversation. The generative-engine-optimisation and answer-engine-optimisation community has been circulating a lot of confident, contradictory, and frequently unsourced claims about llms.txt: that the major models definitely read it, that it definitely boosts citations, or conversely that it is completely ignored. Both extremes tend to be asserted without server logs to back them. The only responsible move was to look at what bots actually do at the file, at scale.

    This is, to my knowledge, the largest single llms.txt server-log and crawl audit conducted to date by number of distinct domains and by volume of bot traffic examined. The domains analysed are real customer sites hosted on Adobe Experience Manager, and they include some of the world’s largest websites, which is what makes the bot behaviour observed here representative rather than anecdotal.

    “Most public claims about llms.txt are made without real analysis. This audit is my attempt to replace assertion with measurement, at the largest domain scale I am aware of.”


    Methodology, scope, and caveats

    Here is the setup in full so that the findings can be challenged or replicated.

    Working with a server log file analysis tool, plus a large-scale crawl of /llms.txt paths, I assembled four datasets:

    DatasetPurposeRowsKey fields
    A, domain scope logWhich hosts received bot traffic, and how many distinct bots and agents each saw6,122 hostsorigin_host, hits, distinct_bots, distinct_agents, first_seen, last_seen
    B, llms.txt existence crawlWhether /llms.txt actually resolves on each host, and what it returns5,553 crawl rows (4,819 distinct URLs, 4,685 distinct hosts)Address, Status Code, Content Type, Word Count, Size (Bytes), Crawl Timestamp
    C, llms.txt hits by host and agentEvery recorded request to /llms.txt, split by host and full user-agent string6,749 rowsHost, request_user_agent, hits
    D, llms.txt hits by agent typeThe same hit volume, pre-classified by agent family237 rowsUser Agent Type, User Agent Name, Full User Agent, Hits

    The hit data in Datasets C and D covers a 30-day window. The crawl in Dataset B carries crawl timestamps dated 29 May 2026.

    The four questions I set out to answer were:

    1. How many domains have a live llms.txt file?
    2. When an LLM reads llms.txt, does it then crawl the .md files it lists?
    3. How are LLMs actually finding the pages they crawl?
    4. Are there any referrals coming from llms.txt?

    A few caveats, stated openly:

    User-agent strings are self-declared. Any bot can claim to be anything. I classify “verifiable LLM” conservatively, counting only agents that match the documented user agents of known model providers such as OpenAI, Anthropic, Perplexity, and You.com. Hits in the “Other / unverified” bucket may include real AI activity behind generic strings, but I will not count what I cannot verify.

    Datasets C and D contain no per-event timestamp column. The 30-day window is the query window the data was extracted under; it is not re-derivable from inside the files.

    Dataset A’s first_seen and last_seen values span a short capture interval, about five minutes on 28 May 2026, which tells me these are sampling markers from one extract rather than the full 30-day span. I therefore use Dataset A only for structural facts such as host counts and bot diversity per host, and never to infer time-based volume.

    The tables below are summary tables. I am not releasing the raw logs. The figures are reproducible in principle by anyone running the same crawl and the same log query.


    How many domains actually have an llms.txt file?

    This is where precision matters most, because “has an llms.txt” is not a single thing. A request to /llms.txt can return a real Markdown file, a redirect, a 404, a soft-200 HTML page, or an empty 200. I broke Dataset B down by HTTP status.

    HTTP status of /llms.txt across 4,685 distinct hosts

    Status codeMeaningCrawl rowsShare of rows
    404Not found (no file)4,27076.9%
    301Permanent redirect60610.9%
    200OK (file served)1753.2%
    403Forbidden1743.1%
    302Temporary redirect1492.7%
    0No response or connection failure901.6%
    401Unauthorised470.8%
    406Not acceptable280.5%
    (blank)No status captured120.2%
    410Gone1under 0.1%
    307Temporary redirect1under 0.1%
    Total5,553100%

    Source: Dataset B, Status Code column. The row count includes 734 duplicate URLs, which I deduplicated before counting hosts.

    A 200 response is necessary but not sufficient to call something a real llms.txt. Many 200s are HTML catch-all pages or empty bodies. So I tightened the definition in two further steps.

    How many of the 200 responses are genuinely an llms.txt file?

    Definition (progressively stricter)Distinct hostsShare of 4,685 probedShare of 6,122 scope-file hosts
    Any HTTP 200 at /llms.txt1372.92%2.24%
    200 and Content-Type: text/plain1112.37%1.81%
    200 and word count above zero200.43%0.33%

    Source: Dataset B, Status Code plus Content Type plus Word Count columns.

    Depending on how strictly you define “has a working llms.txt“, the answer ranges from 137 hosts for any 200, down to 111 hosts for files served as plain text, and as low as 20 hosts for plain-text files with actual measurable content. The 23 responses that returned a 200 with an HTML content type are almost certainly not real llms.txt files at all.

    “Of 4,685 domains probed, only 137 returned a 200 at /llms.txt. Tighten the definition to plain text with real content and the number collapses to 20. Adoption is not just low, much of the apparent adoption is hollow.”

    Data-quality notes for the existence crawl

    IssueDetailHow I handled it
    Duplicate URLs5,553 rows but 4,819 distinct addresses, so 734 duplicate rowsDeduplicated to distinct hosts before counting
    Soft-200 HTML23 of 175 200-responses were text/html, not a text fileExcluded from the strict definitions
    Empty 200s155 of 175 200-responses had a word count of zeroReported separately and flagged as likely empty or placeholder
    Word-count range on real filesThe 20 non-empty files ran from 2 to 69 wordsReported; even the “real” files are extremely short

    A word count between 2 and 69 on the files that do have content tells me most of these are minimal stubs, a title and a couple of links, rather than the rich, curated index the llmstxt.org proposal envisions. Adoption is shallow on both axes: few sites have the file, and few of those have populated it meaningfully.


    Do LLMs crawl the .md files, and are there any referrals from llms.txt?

    These two questions share one answer, and it comes from a direct analysis of the referrer field in the logs.

    I did not find a single request anywhere in the server logs whose referrer was a /llms.txt URL. This held across all bot types, search engines included, not only LLM agents.

    There are two possible explanations, and the logs alone cannot distinguish between them. Either the bots do not crawl immediately: they may read llms.txt, archive or queue what they find, and crawl later in a separate session that carries no referrer linking back to the file. Or the referrer is simply not preserved: bots may crawl the listed .md files but not populate the referrer header with the llms.txt URL.

    Either way, the practical consequence is the same. There is no observable evidence in the logs that llms.txt is functioning as a crawl-routing hub. If llms.txt were doing the job its proposal describes, feeding models a list of URLs that they then fetch, I would expect to see at least some referrer trail. I see none.


    How are LLMs actually finding pages to crawl?

    From the same referrer analysis: when bot requests did carry a referrer, it was, in the overwhelming majority of cases, the homepage of the domain.

    The behavioural picture is that crawlers, including AI crawlers, predominantly enter a site at the homepage and discover the rest of the site by following links from there, exactly as classical web crawlers always have. They are not, on this evidence, entering via llms.txt and fanning out from its curated list. The homepage and its internal linking remain the primary discovery surface. This is a strong argument that the fundamentals of crawlability and internal linking still matter far more than a curated llms.txt for getting your content seen.

    “On the referrer evidence, AI crawlers behave like classical crawlers. They enter at the homepage and follow links. llms.txt is not the front door.”


    Who is actually hitting llms.txt? The 22,494-hit breakdown

    This is the heart of the audit. Dataset D pre-classifies every recorded hit by agent family, and Dataset C lets me verify that classification against the raw user-agent strings. The two reconcile to the same total, 22,494 against 22,493, a one-hit difference from how the two extracts were generated.

    /llms.txt hits by agent type, 30-day window

    User-agent typeHitsShare
    Other / unverified20,74692.2%
    Search engine1,4346.4%
    LLM / AI (verifiable)2581.1%
    SEO / crawlers (declared)360.2%
    Dataset / training130.1%
    Social / preview7under 0.1%
    Total22,494100%

    Source: Dataset D, User Agent Type by Hits.

    Hits by named agent (the agents that are identifiable)

    Named agentOperator familyHits
    GooglebotSearch engine1,219
    OAI-SearchBotOpenAI153
    BaiduSpiderSearch engine127
    ChatGPT-UserOpenAI56
    AmazonbotE-commerce / AI38
    BingbotSearch engine36
    GPTBotOpenAI (training)33
    AhrefsBotSEO tool28
    ApplebotSearch / AI13
    BytespiderByteDance12
    ClaudeBotAnthropic10
    SemrushBotSEO tool6
    Facebook External HitSocial preview5
    PerplexityBotPerplexity4
    Meta ExternalAgentMeta2
    Perplexity-UserPerplexity1
    YouBotYou.com1
    CCBotCommon Crawl1

    Source: Dataset D, User Agent Name by Hits, excluding the “Unknown” aggregate of 20,746.

    The verifiable LLM/AI agents in full

    LLM/AI agentHits
    OAI-SearchBot (OpenAI)153
    ChatGPT-User (OpenAI)56
    GPTBot (OpenAI training)33
    ClaudeBot (Anthropic)10
    PerplexityBot (Perplexity)4
    Perplexity-User (Perplexity)1
    YouBot (You.com)1
    Total verifiable LLM/AI258

    Source: Dataset D, User Agent Type = LLM / AI.

    “Strip out the search engines and the unverifiable bots, and the entire verifiable-LLM interest in llms.txt, across a 30-day window on thousands of domains, amounts to 258 requests. Anthropic, Perplexity, and You.com combined: 16.”

    What is the 92% actually made of?

    The unverified bulk deserves scrutiny rather than a dismissive label. Using Dataset C’s raw user-agent strings, I found that it is dominated by a long tail of self-described tooling: site-statistics bots, monitoring bots, SEO site-audit crawlers, and a striking number of agents whose own user-agent strings advertise that they exist to audit or check llms.txt and AI-readiness.

    Composition of /llms.txt traffic by operator family (raw-string classification)

    Operator familyHitsShareDistinct hosts touched
    Other / unverified (tooling, monitors, auditors)20,77292.3%3,134
    Google1,2275.5%319
    OpenAI2421.1%69
    Baidu1270.6%36
    Amazon380.2%12
    Microsoft / Bing350.2%20
    Apple130.1%13
    ByteDance120.1%5
    Anthropic120.1%11
    Meta8under 0.1%4
    Perplexity5under 0.1%5
    You.com1under 0.1%1
    Common Crawl1under 0.1%1

    Source: Dataset C, full user-agent strings classified by operator. Minor differences from the agent-type table reflect the raw-string method counting AdsBot-Google and similar agents under their parent family.

    Two concentration facts stand out. The top ten user-agent strings alone accounted for 17,569 of 22,493 hits, which is 78.1% of all traffic to the file. And agents whose user-agent string self-labels with terms such as audit, monitor, readiness, llms.txt, crawler, GEO, or research represented 105 distinct agents and 13,508 hits, which is 60.1% of all traffic.

    “60% of all traffic to llms.txt came from agents that openly describe themselves as auditors, monitors, or readiness-checkers. The file’s biggest use case right now is being inspected to see whether it exists, a self-referential market rather than consumption by models.”

    This is the most under-reported reality of llms.txt in mid-2026. Raw hit counts on the file are rising, and it is tempting to read that as LLMs adopting it. The composition says otherwise. A large share of the traffic is the GEO ecosystem checking itself: tools verifying that a customer has the file, monitors polling for changes, readiness-scanners selling the idea that the file matters. That activity is real, but it is not evidence that any model is using the file to answer questions.


    Host-level reality check

    Beyond raw hits, I cross-referenced which hosts have a real file against which hosts received any /llms.txt traffic.

    Hosts: file presence against received traffic

    MeasureCount
    Hosts returning 200 at /llms.txt (www-normalised)130
    Hosts that received at least one /llms.txt request (www-normalised)2,649
    Hosts that both have a file and received a hit80
    Hosts that have a file but recorded no hit50
    Distinct hosts receiving any /llms.txt hit (raw)3,236

    Source: Datasets B and C, joined on www-normalised host.

    Two things stand out. First, the vast majority of /llms.txt requests land on hosts that do not even have the file: bots and tools are probing for it speculatively and hitting 404s. Second, of the hosts that do have a real file, more than a third, 50 of 130, saw no recorded hit at all in the window. Presence and attention are only loosely coupled.



    Limitations and an invitation to challenge

    Here is where this audit stops short.

    User agents are self-declared, so the 92.2% Other bucket could hide real AI activity behind generic strings. I have deliberately under-counted LLM activity rather than over-claim it. The hit datasets carry no per-event timestamps, so the 30-day window is the extraction window rather than a field I can re-derive. Fetched does not mean used: nothing in server logs can prove that any provider used llms.txt content in a model output, because logs show requests, not downstream use. This is a snapshot, a single 30-day window compared qualitatively to a prior one, not a continuous time series. And referrer behaviour is provider-dependent, so the absence of a referrer trail is strong evidence of no observable routing rather than absolute proof that no provider ever crawls from the file.

    If you can replicate, extend, or contradict any of this with your own logs, I want to hear about it. I will investigate and publish a visible correction if anything here proves wrong.


    Frequently asked questions

    How many websites actually have an llms.txt file? In this audit, of 4,685 domains probed, 137 returned a working 200 response at /llms.txt, which is about 2.9%. If you require the file to be served as plain text the number is 111, and if you require it to contain real content it drops to 20.

    What percentage of websites have llms.txt? On this AEM-hosted sample, between 0.4% and 2.9% depending on how strictly you define a working file. The headline figure of 2.9% counts any 200 response; the strict figure of 0.4% counts only plain-text files with measurable content.

    Do large language models actually read llms.txt? Rarely, on this evidence. Verifiable LLM agents accounted for 258 of 22,494 requests to the file, which is 1.1% of all traffic, over a 30-day window across thousands of domains.

    Does ChatGPT use llms.txt? OpenAI’s search and user agents, OAI-SearchBot and ChatGPT-User, made 209 requests across roughly 69 hosts. That is real but tiny, and there is no evidence in the logs that the file drives any onward crawling.

    Does Google use llms.txt? Googlebot is now the single largest named crawler hitting the file, with 1,219 requests. Google has also begun including llms.txt in Lighthouse checks. A fetch is not proof of use in ranking or AI features, but it is a clear change from a year ago.

    Does Gemini or Google AI Mode use llms.txt? I cannot confirm this from the data. What I can confirm is that Googlebot is fetching the file. Whether that content feeds AI Mode or AI Overviews is plausible but unproven on these logs.

    Does Claude use llms.txt? Anthropic’s ClaudeBot made 10 requests to the file across the entire dataset. That is negligible.

    Does Perplexity use llms.txt? Perplexity’s agents made 5 requests in total, PerplexityBot and Perplexity-User combined. That is negligible.

    Is llms.txt worth creating in 2026? My view is yes, but as cheap insurance rather than a growth lever. It costs little to create, Google is now hitting it, and the upside is asymmetric if providers begin to consume it. Do not expect it to move LLM citations today.

    Will llms.txt improve my rankings? There is no evidence in this data that it does. Crawlers enter via the homepage and follow internal links. Classical crawlability and internal linking remain far more important.

    Will llms.txt get my brand cited in AI answers? Probably not at present. The models that drive consumer AI answers are barely touching the file, and there is no observable crawl activity downstream of it.

    Do LLMs crawl the .md files listed in llms.txt? There is no evidence that they do so directly from the file. I found zero requests whose referrer was an llms.txt URL, so either crawlers do not crawl immediately after reading it, or they do not preserve the referrer.

    How do LLMs and AI crawlers find pages to crawl? Predominantly via the homepage. When requests carried a referrer it was almost always the domain homepage, indicating crawlers enter there and follow internal links, exactly as classical crawlers do.

    Should llms.txt be plain text or HTML? Plain text. In this audit, 23 of the 175 200-responses were served as HTML, and those are almost certainly catch-all pages rather than real llms.txt files. A real file should return text/plain.

    Why do so many llms.txt requests return a 404? Because most sites do not have the file. In this crawl, 76.9% of probed URLs returned a 404. Many bots and tools probe for /llms.txt speculatively and simply hit a missing file.

    What bots hit llms.txt the most? The largest single sources are unverified tooling and monitoring bots, followed by Googlebot as the largest named crawler. The top ten user-agent strings alone made up 78.1% of all traffic to the file.

    Are most llms.txt hits really from AI models? No. 92.2% of traffic came from agents that are neither mainstream search engines nor verifiable LLMs, largely SEO tools, monitors, and AI-readiness auditors. Only 1.1% came from verifiable LLMs.

    What is an llms.txt auditor bot? It is a crawler, often from a GEO or SEO tool, whose purpose is to check whether a site has an llms.txt file and report on it. In this dataset, agents that self-label as auditors, monitors, or readiness-checkers accounted for 60.1% of all traffic to the file.

    Does having an llms.txt file guarantee bots will read it? No. Of the 130 hosts with a real file, 50 recorded no hit at all in the window. Presence and attention are only loosely coupled.

    How big should an llms.txt file be? The proposal envisions a curated index, but in practice the files that had content in this audit were very short, between 2 and 69 words, suggesting most are minimal stubs. Aim for a genuinely useful, curated list of your most important pages rather than a token file.

    Is llms.txt the same as robots.txt or sitemap.xml? It is similar in concept, a small conventional file at a predictable path, but different in standing. robots.txt and sitemap.xml are honoured by documented crawlers, whereas llms.txt only delivers value if model providers choose to read it, and on this evidence most do not yet.

    Did anything change with llms.txt between 2025 and 2026? The biggest change is Google. Googlebot went from a non-presence to the largest named crawler at the file, and Google added it to Lighthouse. Everything else stayed roughly the same: verifiable LLM usage remained negligible, and no referrer trail from the file appeared.

    Is this the largest llms.txt study? To my knowledge, yes, by number of distinct domains and by volume of bot traffic examined. The data comes from real customer domains hosted on Adobe Experience Manager, including some of the world’s largest websites.

    Where does the data in this analysis come from? From server-log and crawl data across customer domains hosted on Adobe Experience Manager, analysed with a server log file analysis tool over a 30-day window, with a companion crawl of /llms.txt paths dated 29 May 2026.

    How was the data anonymised? No customer, brand, or third-party vendor names appear anywhere in this article. Every identifier has been removed and replaced with a neutral category label, and only aggregate summary figures are published.

    Can I reproduce these findings myself? Yes, in principle. Crawl /llms.txt across your domain set and record status, content type, and word count; query 30 days of server logs for requests to /llms.txt grouped by host and user-agent string; classify user agents conservatively; and separately query the referrer field for any request whose referrer is /llms.txt.

    What is the single most important takeaway? That raw hit counts on llms.txt are misleading. Most of the traffic is the GEO ecosystem checking itself, not models consuming the file. Create the file because it is cheap and Google is now looking at it, but keep your real investment in homepage strength and internal linking.

    A note on the data and on disclosure. The findings below come from server-log and crawl data across customer domains hosted on Adobe Experience Manager (AEM). I analysed this data directly using a server log file analysis tool. I work in this field, and all views expressed here are my own and do not represent those of my employer. No customer, brand, or third-party vendor names appear anywhere in this article. Every identifier has been removed and replaced with a neutral category label.


    Written by Flavio Longato and published June 2026 on longato.ch. All views my own and not those of my employer. Companion analysis: llms.txt, my recommendation, August 2025. Spotted an error? Get in touch via longato.ch and I will publish a visible correction.

  • How to Write GEO Prompts for Reliable LLM Insights

    A big part of my work at Adobe is to work with customer and ensure that their Adobe LLM Optimizer is of value to them. This often involves me and my team auditing the prompts that are within their account. From hundreds of customer meetings I’ve had, I’d say that 90% of them don’t quite understand the change from SEO queries to LLM prompts.

    Therefore, I’ve been investing my time to try and anwser the question:

    How do you write prompts that give you reliable, repeatable insights?

    When I speak to marketing teams about GEO, one question always comes up:

    If you are measuring visibility inside large language models, your prompts are not casual questions. They are instruments. They shape the data you collect. If they are vague or inconsistent, your results will drift. If they are precise and structured, your results become stable and meaningful.” – Flavio Longato

    In my experience working with GEO, the key is simple: treat prompts like test cases.

    Think of Prompts Like Test Cases

    Most marketers still treat prompts like search queries. That’s usually where GEO measurement starts to break down.

    A prompt is closer to a test case than a keyword. A good test case is realistic, specific, and repeatable. If the input changes too much, you don’t know whether the output changed because of the system or because of the test itself.

    Recent research highlights why this matters more than many teams expect. A large study from SparkToro ran identical prompts across multiple AI systems and compared the brand recommendations returned each time. Even when nothing changed in the input, the results were highly inconsistent. Brand lists shifted, ordering changed, and sometimes completely different companies appeared. In many scenarios, there was less than a 1% chance of receiving the same set of brands twice.

    This doesn’t mean AI visibility is unreliable. It means the input needs stronger structure.

    When a prompt is too broad, the model has many valid directions it can explore. One run might emphasise pricing, another might focus on features, and a third might lean on brand familiarity. From a GEO perspective, that variability looks like ranking movement, but in reality it’s just different reasoning paths.

    That’s why I recommend building prompts with three consistent elements:

    • A clear goal – what the user is trying to achieve
    • A constraint – experience level, budget, region, or use case
    • Context – comparison framing or requirements

    For example:

    Instead of:
    “Best PDF software”

    Use:
    “I need a PDF tool for a beginner that lets me convert to Word and edit files. I’m comparing two options and want something simple.”

    The second version behaves like a controlled experiment. It narrows interpretation and reduces randomness across runs.

    The SparkToro findings reinforce this approach. Their data suggests that tracking visibility across repeated, structured prompts is far more reliable than evaluating a single response or focusing only on position. Brands that appear consistently across many executions are more likely to be part of the model’s core consideration set.

    Consistency doesn’t come from the model. It comes from the way you design the prompt.

    Why treat it as a software test:

    In software testing, a good test case is:

    • realistic
    • specific
    • repeatable
    • The same logic applies to GEO.

    A realistic test case mirrors real user behaviour. It reflects how someone would genuinely ask for help. A specific test case defines intent clearly. A repeatable test case produces consistent outputs when run under the same conditions.

    If your prompt is too broad, the AI assistant has too many valid directions it can take. Each direction may be reasonable, but your visibility measurement becomes unstable. One day your brand appears. The next day it does not. Nothing meaningful has changed except the interpretation space.

    That is not a visibility shift. It is measurement noise.

    Why Broad Prompts Create Random Visibility

    When a prompt lacks structure, the model fills in the gaps. It guesses intent. It assumes context. It selects one of many possible frames.

    For example:

    • “What’s the best CRM?”
    • “How should I improve my marketing?”
    • “Which tool is better?”

    Each of these prompts is valid. Each has multiple reasonable answers. But from a GEO perspective, they are weak test cases. The output can vary based on subtle sampling changes, updates in model training, or shifting internal weighting.

    Your visibility score becomes volatile because the prompt itself is unstable.

    As discussed in industry research around generative search and model behaviour, consistency of input is essential for consistency of output. This principle also appears in discussions about large language model evaluation in resources such as OpenAI research publications.

    The Structure of a Reliable GEO Prompt

    In practice, reliable prompts usually include three elements:

    1. A Clear Goal

    The assistant needs to know what it is helping with.

    Examples:

    • “Help me choose”
    • “Recommend the best option”
    • “Compare these two tools”
    • “Rank these solutions”

    Without a clear goal, the model may default to explanation rather than decision support.

    2. A Constraint

    Constraints narrow the solution space. They reduce ambiguity.

    Examples:

    • “For a beginner”
    • “For a small B2B marketing team”
    • “With a limited budget”
    • “For an e-commerce company in Switzerland”

    Constraints anchor the response to a defined persona or situation. This increases repeatability because the model does not need to infer who the user is.

    3. Context

    Context defines the frame of comparison.

    Examples:

    • “I am comparing HubSpot and Pipedrive.”
    • “I need email automation and CRM integration.”
    • “We have five employees and no technical team.”

    When context is explicit, the assistant does not need to guess requirements. Fewer assumptions lead to more stable outputs.

    In short, a strong GEO prompt looks like this:

    “Help me choose between Tool A and Tool B for a beginner marketing manager at a small B2B company. We need CRM integration and simple reporting.”

    That is a test case. It is realistic. It has intent. It has constraints. It has context. It can be run again and compared over time.

    Consistency Is More Important Than Creativity

    In SEO, creativity can help content stand out. In GEO measurement, creativity can damage reliability.

    If you rewrite your prompts every week, you are not tracking model visibility. You are testing new scenarios.

    I recommend using a consistent template. For example:

    • Goal: Job to Be Done (JTBD) of the page
    • Persona: Who is the target audience of the brand, site and landing page
    • Constraints: What friction point is this page trying to resolve
    • Comparison set: What do other competitors do on similar pages

    By keeping the structure stable, you isolate changes. If results shift, you can more confidently attribute that shift to model behaviour rather than prompt variation.

    This is especially important when measuring brand inclusion or ranking within generative responses, a topic increasingly discussed in the context of generative engine optimisation.

    Version Your Prompts

    Over time, your understanding of GEO will improve. Your prompts will evolve. That is normal. But evolution must be controlled.

    I always recommend versioning prompts. Keep a simple log:

    • Prompt v1.0 – Initial baseline
    • Prompt v1.1 – Added constraint
    • Prompt v1.2 – Refined persona
    • Prompt v2.0 – New comparison set

    When visibility changes, you can check whether:

    • The model changed
    • Your configuration changed
    • The prompt changed

    Without versioning, you lose traceability.

    This approach mirrors good experimental practice. In evaluation frameworks such as those discussed by Google AI research, reproducibility is central. GEO should follow the same discipline.

    Avoid Frequent Structural Changes

    There is another practical issue: historical comparability.

    If you continuously add and delete topics, entities, or comparison options in your GEO tracking setup, your visibility baseline shifts. You may see score drops that are not performance issues, but structural changes.

    For example:

    • Adding new competitors changes ranking distribution.
    • Removing requirements alters response framing.
    • Switching persona definitions shifts relevance weighting.

    When you make large structural edits, treat them as a new measurement phase. Do not compare them blindly to old data.

    Stable input produces stable trend lines.

    Build a Prompt Library

    In my work, I build a prompt library rather than a loose collection of questions. Each prompt:

    • Has a defined intent
    • Targets a clear user scenario
    • Uses consistent structure
    • Is version controlled
    • Is tied to a measurement objective

    This transforms GEO from experimentation into systematic analysis.

    Over time, patterns emerge:

    • Which prompts consistently surface your brand?
    • Which personas trigger competitor mentions?
    • Where does the model hesitate or diversify?

    Those patterns only appear when your inputs are disciplined.

    From Keywords to Intent-Based Test Cases

    In traditional SEO, we optimised for keywords. In GEO, we optimise for intent expressions.

    A keyword like “best CRM” is not enough. A structured prompt that simulates a real buying decision is far more powerful.

    This shift aligns with broader industry commentary on search evolution, including perspectives shared on platforms such as Search Engine Land.

    GEO is not about ranking for fragments. It is about appearing in structured decision contexts.

    Ground everything

    Grounding is the process of ensuring LLM responses are linked to real-world data / information. Often, if you simply prompt a task with LLMs it will hallucinate. To ensure this does not happen I ground the data on:

    • The website and content itself
    • Branding material for the website
    • SEO Data (query data, page metrics, backlink information and competitor data)

    Final Thoughts

    Reliable GEO insights do not come from clever phrasing. They come from disciplined design.

    I treat every prompt as a test case:

    • Realistic
    • Specific
    • Repeatable

    I include a clear goal, defined constraints, and explicit context. I keep templates consistent. I version changes. I avoid unnecessary structural edits.

    When you approach prompts this way, your visibility data becomes meaningful. Trends become interpretable. Optimisation becomes strategic rather than reactive.

    In GEO, measurement quality starts with prompt quality. If you control the input, you can trust the insight.

  • How Do LLMs Choose Citations? The Selection Process

    How Do LLMs Choose Citations? The Selection Process

    Large language models decide which citations to include by retrieving external sources at query time and evaluating them across multiple expanded sub-queries. Rather than relying on a single ranking, they prioritise sources that show consistent relevance and visibility across related intents.

    Large language models (LLMs) do not “remember” the live web. They are trained on vast datasets with a defined cut-off date. Any information published after that point is not part of the core model. That is why many modern systems rely on retrieval processes to surface up-to-date information and attach citations to their answers.

    If there is no retrieval layer, there is usually no citation. The model can generate fluent text from its training data, but it cannot point to a current, verifiable source. Citations typically appear only when a retrieval-augmented pipeline is in place.

    Training Data vs Real-Time Retrieval

    All foundation models have a knowledge cut-off. For example, if a model is finalised in May, it cannot natively “know” what was published in June. To bridge that gap, many systems use Retrieval-Augmented Generation (RAG). This method allows the model to query external search indexes or document stores at runtime.

    In simple terms, the model does not just answer from memory. It searches, retrieves, ranks, and then generates.

    Prompt Fan-Out: Why One Question Becomes Many

    When you submit a prompt, the system may expand it into several smaller queries. This is often called “fan-out”.

    Take this example:

    I need a PDF software that allows me to save a PDF to Word and edit it.

    Instead of sending that exact sentence to a search engine, the system may break it into sub-queries such as:

    • Best PDF editing software
    • Convert PDF to Word software
    • Edit PDF after converting to Word
    • PDF to DOCX tools comparison

    This expansion improves coverage. It captures subtopics, entities, and user intent variations. It also increases the chance of retrieving comprehensive and current information.

    How Sources (citations) Are Evaluated

    Once results are retrieved, the system evaluates them across several dimensions:

    • Relevance to each sub-query
    • Topical consistency
    • Frequency of appearance across different query paths
    • Domain authority signals
    • Sentiment alignment and contextual fit

    Here is the crucial point: citations are rarely chosen from a single ranking position. They are often selected based on aggregate visibility across multiple fan-out queries.

    If Brand A appears consistently across “best PDF editor”, “convert PDF to Word”, and “edit PDF tool comparison”, it may outrank a competitor that ranks first for only one query. Consistency across sub-intents increases average retrieval strength.

    This behaviour aligns with principles seen in large-scale information retrieval systems such as BERT-based ranking, where semantic coverage and contextual relevance matter more than exact keyword matching.

    Why You Sometimes See Links — and Sometimes Do Not

    If the model uses RAG, citations can be attached to specific claims. These links are retrieved from external sources at query time.

    If the model responds purely from its trained parameters, any URLs mentioned in the text are generated content. They are not dynamically retrieved references. In that case, they are not citations in the strict retrieval sense.

    So the rule is simple:

    • No retrieval layer → no genuine citation
    • Retrieval layer present → citations selected from ranked results

    What This Means for SEO and GEO Strategy

    This has direct implications for search optimisation and generative engine optimisation.

    LLMs do not cite a brand simply because it ranks position one for a single keyword. They cite brands that demonstrate consistent topical strength across multiple related queries.

    The practical takeaway is clear:

    • Cover the topic in depth
    • Address sub-intents explicitly
    • Rank for variations of the core query
    • Maintain consistent brand visibility across related searches

    When your content appears repeatedly across fan-out queries, your average retrieval strength increases. That increases the probability of citation.

    In other words, citation likelihood is not driven by a single ranking. It is driven by aggregate visibility across multiple intent paths.

    Depth Beats Position

    A site ranking ninth but covering every sub-intent thoroughly may outperform a site ranking first for one narrow query. Retrieval systems reward breadth, consistency, and contextual alignment.

    For practitioners, this reinforces a familiar truth. Traditional SEO still matters. Strong technical foundations, structured content, and first-page visibility remain essential. But depth and topical completeness are now equally critical.

    Final Thoughts

    Large language models select citations based on aggregated retrieval signals across expanded query paths. They evaluate frequency, relevance, and consistency rather than relying on a single top-ranking result.

    If you want to be cited, you must be visible across the ecosystem of related queries. Comprehensive coverage increases retrieval strength. Retrieval strength increases citation probability.

    SEO is not obsolete. It has simply become multidimensional.

  • Bing AI Performance Report: GEO Impact Analysis

    Microsoft has introduced a new AI Performance Report inside Bing Webmaster Tools. In my view, this marks one of the first real steps toward measuring visibility in AI experiences, not just traditional search rankings.

    In this article, I want to summarise what I explained in my video: what the report shows, how the metrics work, where the gaps still are, and why this matters if you care about Generative Engine Optimization (GEO).


    Why Microsoft Released the AI Performance Report

    For years we measured success using clicks, impressions, and rankings. That model starts to break down once AI answers summarize content directly inside Copilot or AI summaries.

    The new report introduces an analytics layer focused on AI citations instead of classic SERP performance.

    From my perspective, the goal is clear:

    • Help publishers understand when their content is used as a source in AI answers
    • Provide visibility into which topics trigger citations
    • Move measurement closer to influence, not just traffic

    Microsoft describes this as giving publishers insight into how content appears across AI experiences within Bing.


    What the New Metrics Actually Mean

    Inside the report, there are a few core metrics that matter.

    Total Citations

    This shows how often pages from your website appear as sources in AI-generated responses during a selected time period.

    This is not a ranking signal and it is not traffic. It is simply confirmation that Bing’s AI systems referenced your content.

    Average Cited Pages

    This metric represents the average number of unique pages cited per day.

    I see this as a rough indicator of topical depth. If more pages are cited, it often means Bing recognizes broader authority around a subject.

    Page-Level Citation Data

    You can drill down to see:

    • Which URLs are cited
    • How frequently they appear
    • The query themes connected to those citations

    One important detail: Bing does not show the actual prompts. Instead, it shows the “fan-out” search queries that likely contributed to the AI response.


    The Biggest Limitation: No Prompt Data

    One thing I was really hoping for was access to the actual prompts.

    Right now:

    • You do not see the original AI question
    • You do not see click-through rate
    • You do not see user engagement from the AI answer itself

    Instead, Bing exposes the expanded queries derived from prompts.

    This is useful, but it means analysts still need to reverse-engineer intent rather than measure it directly.


    How This Differs From Traditional Search Performance

    Here is how I personally separate the two reporting models.

    Classic Search PerformanceAI Performance Report
    Focus on clicks and rankingsFocus on citations
    Measures SERP behaviorMeasures AI usage
    Keyword-driven analysisPrompt fan-out analysis
    Visibility tied to trafficVisibility tied to influence

    In short, we are moving from measuring Did someone click? to Was my content used as a source?

    That is a major shift in how discovery works.


    Why Citations Matter Even Without Clicks

    One of the key points I make in the video is that influence now happens even when there is no visit.

    If your content is cited:

    • Your brand or expertise shapes the answer
    • Your information influences user decisions
    • But analytics may show zero traffic

    This is exactly why GEO is becoming critical. Visibility is no longer limited to blue links.


    How This Connects to Adobe LLM Optimizer and GEO Workflows

    Even with this new report, I still see tools like Adobe LLM Optimizer as highly relevant.

    Why?

    Because Bing still does not provide:

    • Prompt data
    • Cross-platform visibility (ChatGPT, Gemini, etc.)
    • Deep competitive insights

    In my opinion, the real opportunity is combining Bing’s citation data with:

    • Log file analysis
    • Prompt simulations
    • LLM monitoring tools

    My team is already exploring how to ingest these grounded queries and use them to better understand prompt behavior.


    Practical Takeaways From the Report

    If you are working on GEO or AI visibility, here is how I would approach this new data:

    1. Identify URLs with high citation counts and expand those topic clusters.
    2. Look at fan-out queries to understand how prompts branch into multiple searches.
    3. Compare citation activity with crawl logs to validate AI usage patterns.
    4. Treat citations as an influence metric, not a traffic metric.

    What This Report Does Not Cover (Yet)

    It is important to set expectations.

    Right now the report only reflects:

    • Bing Copilot and Bing AI experiences
    • Bing’s own ecosystem

    It does not include:

    • ChatGPT
    • Perplexity
    • Gemini
    • Other LLM platforms

    So while it is a big step forward, it is still just one piece of the AI visibility puzzle.


    My Conclusion

    I see this release as the first official GEO-style reporting feature from a major search platform.

    It shows that measurement is shifting away from rankings and toward AI usage and citations.

    But we are still early.

    Without prompts, cross-platform data, or CTR visibility, we need to combine this report with external tooling and deeper analysis.

    Still, this is a strong signal of where search analytics is heading next.


  • What Is LLM Crawling and Why Does It Matter?

    Large language models now crawl websites much like search engines do. But many site owners have no idea their pages are invisible to these systems. If your content cannot be read by an LLM, you lose a growing source of traffic and citations.

    I have spent years working on technical SEO, and I can tell you that the overlap between search engine optimisation and LLM readability is huge. The same foundations that help Google read your site also help ChatGPT, Perplexity, and other AI tools find and reference your content. Yet there are key differences that catch people off guard.

    How LLMs Crawl and Process Web Content

    LLM crawling follows a familiar pattern. A bot visits your site, fetches your pages, and reads the content. In traditional SEO, we talk about crawling, indexing, and ranking. With LLMs, the steps are crawling, tokenisation, and rendering. The bot arrives, collects the text, breaks it into tokens, and stores it for later use in responses.

    If a page cannot be crawled or read, no AI system will use it as a source. That means no citations, no referrals, and no visibility in AI-generated answers. This is a real problem for businesses that rely on organic discovery. According to Google’s crawler documentation, the basic principles of making content accessible to bots have not changed much. But LLMs add a few new wrinkles.

    Common Technical Blockers

    Several technical issues stop LLMs from seeing your content. The most common one is robots.txt. When LLMs first appeared around 2023 and 2024, many website owners blocked AI crawlers out of fear. They worried that models would absorb their content without giving credit. Now it is 2026, and that stance is counterproductive. More people use LLMs every day. Blocking these bots means you opt out of a real traffic channel.

    Another blocker that surprised many site owners was CDN default settings. Cloudflare, for example, started blocking LLM bots by default for new customers in late 2025. If you use a CDN, check your bot management settings. You might be blocking AI crawlers without knowing it. In your server logs or monitoring tools, this shows up as a string of 403 or 404 errors for known LLM user agents.

    Other blockers include:

    • Inconsistent canonical tags that waste crawl budget
    • URL parameters creating duplicate pages
    • Content behind logins or paywalls
    • Heavy interstitials that block the page content

    These are familiar problems in SEO. But with LLMs, the tolerance is even lower. A search engine might still manage to parse a messy page. An LLM bot often will not bother. As Search Engine Journal explains, crawl budget matters for every type of bot, not just Googlebot.

    Why JavaScript Rendering Is the Biggest Problem

    Here is my contrarian take: the single biggest barrier to LLM visibility is not robots.txt or CDN settings. It is client-side JavaScript rendering. Most people in the SEO world stopped worrying about JavaScript a couple of years ago because Google got very good at rendering it. That gave everyone a false sense of security.

    LLMs do not render JavaScript the way Google does. When an LLM bot visits a page, it typically reads the raw HTML without executing scripts. If your content loads through React, Angular, Vue, or any other client-side framework, the bot may see an empty shell. I have personally audited sites where only 70 to 75 percent of the page content was visible to LLM crawlers. That is a huge chunk of missing information.

    From my own experience building and managing websites early in my career, I know how painful it is to fix rendering issues at the infrastructure level. You need developer resources, time, and tickets that sit in a backlog for months. Server-side rendering or static site generation is the proper fix, but it is slow to implement. Edge rendering solutions offer a faster workaround. They pre-render your pages and serve the full HTML to LLM bots, pushing visibility from partial to complete.

    How to Check Your LLM Visibility

    You should not guess whether LLMs can see your content. Test it. One practical method is to compare the word count of a fully rendered page (what a human browser sees) against what an LLM bot receives (the raw HTML response). A large gap means you have a rendering problem.

    Browser extensions and specialised tools can automate this comparison. They highlight exactly which sections of your page are invisible to AI crawlers. This gives you hard data to bring to your development team. Instead of saying “we think there is a problem,” you can say “42 percent of our product page content is hidden from LLM bots, and here is the proof.”

    You should also review your robots.txt file and check for any directives that block known LLM user agents like GPTBot, ClaudeBot, or PerplexityBot. A quick audit of your CDN settings is equally important.

    Looking Ahead

    LLM crawling is not a passing trend. It is becoming a standard part of how people find information online. The sites that treat LLM readability as a first-class concern today will have a clear advantage as AI-driven search grows. Those that ignore it will watch their content disappear from an increasingly important channel.

    The good news is that most fixes are straightforward. Unblock your robots.txt, check your CDN, and address JavaScript rendering gaps. These are not exotic tasks. They are the same kind of technical hygiene that good SEO has always demanded. The difference now is that the audience includes machines that summarise, cite, and recommend your content to millions of users.

  • What Is the Difference Between AI Mentions and Citations?

    If you have been paying attention to how AI tools like ChatGPT or Google Gemini respond to user queries, you have probably noticed that some brands appear in the text while others get a clickable link at the bottom. These are two very different things. One is a mention. The other is a citation. And the distinction matters more than most marketers realise.

    I have spent the past year studying how large language models reference brands and websites in their outputs. What I have found is that many SEO professionals conflate mentions and citations, treating them as interchangeable. They are not. Understanding the gap between the two is essential if you want your brand to show up properly in AI-generated answers.

    What Counts as a Mention in AI Answers

    A mention happens when an AI model includes your brand name or product name in the body of its response. For example, if you ask ChatGPT “how to edit PDFs,” it might write something like “Adobe Acrobat is a popular tool for editing PDF files.” That is a mention. Adobe and Acrobat appear in the text, but there is no link pointing back to Adobe’s website.

    Mentions come from the model’s training data. The AI has processed billions of web pages and learned associations between brands and topics. It knows that Adobe is connected to PDF editing because that relationship appeared thousands of times across the data it was trained on. The model is not fetching this information live from the web. It is recalling patterns from its training.

    This is an important point. A mention does not mean the AI visited your website or verified your content. It simply means your brand is strongly associated with a given topic in the model’s learned knowledge. You could have zero indexable pages and still get mentioned if your brand is well-known enough.

    How Citations Work Differently

    A citation is something else entirely. It occurs when the AI links to your page as a supporting source for its answer. This typically happens through retrieval-augmented generation (RAG), where the model actively searches the web or a defined index to pull in fresh information before composing its response.

    When a system like Bing Chat or Google’s AI Overview performs a live search, it retrieves web pages, extracts relevant information, and then weaves that into its answer. The pages it pulled from get listed as citations, usually with clickable links. This is a much stronger signal than a mention because it means the AI treated your content as evidence.

    Think of it this way. A mention says “this brand exists and is relevant.” A citation says “this specific page helped me answer the question.” The difference in value is significant for anyone thinking about generative engine optimisation.

    Why Most Marketers Get This Wrong

    Here is my contrarian take. Most of the current discourse around “AI SEO” focuses too heavily on getting mentioned. People celebrate when ChatGPT name-drops their brand. But a mention without a citation is a bit like being talked about at a party without anyone knowing your address. It builds awareness, sure. But it does not drive traffic or prove authority in the way a citation does.

    I have seen brands with strong mentions but almost no citations. Their names appear in AI answers, yet the models never link back to their actual content. This usually happens when a brand is famous but its web pages are not structured well enough to be retrieved by RAG systems. The opposite also exists. Smaller, well-optimised sites earning citations despite having lower brand recognition.

    The practical lesson here is that optimising for citations requires a different approach than optimising for mentions. Mentions grow from brand awareness and PR. Citations grow from having well-structured, authoritative, and schema-marked content that RAG systems can easily retrieve and verify.

    What This Means for Your Strategy

    If you are serious about showing up in AI-generated results, you need to work on both fronts. For mentions, focus on building genuine brand authority across the web. Get covered by reputable publications. Build a strong presence on platforms that LLMs are trained on. This is long-term brand building.

    For citations, the work is more technical. Make sure your pages are crawlable, fast, and clearly structured. Use proper headings. Include factual, verifiable claims. According to Google’s own E-E-A-T framework, content that demonstrates first-hand experience and expertise is more likely to be deemed trustworthy. RAG systems appear to follow similar logic when selecting which sources to cite.

    From my own testing, pages that answer specific questions clearly and concisely tend to earn more citations than long, rambling guides. The AI is looking for evidence, not filler. Give it a clean answer it can point to.

    The brands that will win in this new era are the ones that understand both signals and treat them as complementary. Mentions build the top of funnel. Citations build the trust. Get both right, and you are well positioned regardless of how AI search evolves from here.

  • What Is AI Visibility Score and How Do You Measure It

    If you have been working on getting your brand visible inside AI-generated answers, you have probably come across the term “visibility score.” It sounds straightforward, but the reality is messier than most people expect. I have spent a fair amount of time testing different AI visibility tools, and I want to share what I have learned about what this metric actually means and which supporting numbers you should watch alongside it.

    What a Visibility Score Actually Tells You

    A visibility score is an aggregate metric. It rolls up several signals into a single number that represents how often and how prominently your brand appears across a set of AI prompts. The inputs typically include whether you were mentioned, whether a citation pointed back to your site, where in the answer your brand appeared, and the sentiment of the mention.

    The trouble is that every tool calculates it differently. There is no universal standard. LLM Optimizer, for instance, weights mentions, citations, URL presence, position (first, second, third, fourth), and sentiment into a composite figure. A brand that gets mentioned first with a positive tone and a backlink scores far higher than one that appears third with no citation and a neutral tone. Other platforms may skip sentiment entirely or weigh position differently.

    This lack of standardisation is something I think the industry needs to address quickly. If you compare your score across two different tools, you might get wildly different numbers for the same set of prompts. That makes benchmarking against competitors tricky unless everyone agrees on one platform.

    A Real-World Example of How Scores Break Down

    Let me walk through a practical case. Take the prompt “how to make the perfect espresso shot.” In LLM Optimizer, a brand tracking that prompt might see a visibility score of around 22. Why so low? Because the brand was mentioned but had no citation link. The sentiment was neutral, not negative, which helps, but the absence of a URL pointing back to the site drags the score down considerably.

    The ideal scenario would be a mention in the first position, a direct citation to your website, and positive sentiment. That combination pushes you towards 100%. In my experience, very few brands consistently hit that ceiling across a broad prompt set. The ones that do tend to have strong topical authority and structured data that AI models find easy to reference. According to research from Search Engine Land, brands that invest in entity-based SEO tend to perform better in AI-generated results precisely because large language models favour well-structured, authoritative sources.

    Why Visibility Score Alone Is Not Enough

    Here is where I hold a view that goes against the grain. Many marketers treat visibility score as the single north-star metric for AI search performance. I think that is a mistake. The score is too broad to act on directly. If your visibility score drops by ten points this week, what exactly do you fix? The number itself does not tell you.

    You need to pair it with more granular metrics. Brand mentions over time show you whether your presence is growing or shrinking. Citation tracking tells you if AI models are actually linking back to your content. Agentic traffic and referral data from tools like Google Analytics reveal whether those AI mentions translate into real visits. Without these supporting signals, you are flying blind with a single number that could move for a dozen different reasons.

    I have been doing SEO and digital marketing for over fifteen years, and every time a new “single metric” emerges, teams fixate on it at the expense of nuance. Visibility score is useful for board-level reporting, but the actual optimisation work happens when you drill into the components beneath it.

    Do Not Forget AI Features in Traditional Search

    One detail that often gets overlooked is that AI features inside traditional search results, such as Google’s AI Overviews, are frequently counted as part of your overall search performance reports. This means your visibility score and your standard SEO metrics are not entirely separate worlds. If you are tracking performance in Google Search Console, some of those impressions may already include AI-generated snippets.

    The practical takeaway is that you need to blend your AI visibility data with your existing search analytics. Looking at either in isolation gives you an incomplete picture. A high visibility score in ChatGPT or Perplexity means little if those mentions never convert into site traffic, and a dip in organic impressions might partly be explained by shifts in AI feature placement rather than a ranking penalty.

    Picking the Right Metrics for Your Situation

    If I had to recommend a starting dashboard for AI visibility, it would include four things: the aggregate visibility score for trend monitoring, citation count with URLs to see which pages AI models prefer, sentiment breakdown to catch reputation issues early, and referral traffic from AI sources to measure actual business impact.

    Start with those four and expand as your understanding deepens. The tools are evolving quickly and standardisation will come eventually. Until then, pick one platform, learn its methodology inside out, and resist the temptation to chase a perfect score. The brands that win in AI search will be the ones that understand what sits behind the number, not just the number itself.

  • How to Map Prompts to Personas for Better LLM Visibility

    Most businesses treat their audience as one big group when optimising for large language model visibility. They write a single set of prompts, test them broadly and call it a day. The trouble is, averaging your visibility across an entire audience hides the gaps where you are invisible to the people who matter most. Mapping prompts to specific personas is the fix, and it is simpler than you might think.

    Why One-Size-Fits-All Prompting Falls Short

    When I first started testing how brands appear inside AI-generated answers, I made the same mistake everyone else does. I wrote prompts from my own point of view and assumed the results spoke for the whole market. They did not. A procurement director searching for manufacturing software asks questions nothing like those a graduate engineer would type. Their vocabulary differs, their intent differs and the depth of answer they expect differs. If you only test with generic prompts, you will see a comfortable average that masks real blind spots.

    Research from the Search Engine Land guide on GEO confirms that generative engine optimisation requires thinking about user intent at a granular level. Generic content may rank, but it rarely gets cited when an LLM assembles a tailored response for a specific user need.

    What Persona-Based Prompt Mapping Actually Means

    Persona-based prompt mapping means grouping your test prompts by a real user type. Not a fictional marketing avatar with a name and a stock photo, but a practical profile built on genuine differences in intent, language and expectations. Think of categories like these:

    • Decision makers who need ROI figures and comparisons.
    • Practitioners who want step-by-step technical detail.
    • Beginners who ask broad, exploratory questions.
    • Troubleshooters who arrive with a specific problem to solve.

    Each group phrases questions differently and expects a different shape of answer. A decision maker might prompt an LLM with “best enterprise CRM for mid-market manufacturers,” while a practitioner asks “how to configure lead scoring rules in HubSpot.” Testing both tells you where your content actually performs and where it vanishes.

    How I Build Persona Prompt Clusters

    Inside LLM Optimizer, the workflow I recommend starts with listing your ideal customer profiles. For each profile, brainstorm the questions that person would realistically put to ChatGPT, Gemini or Perplexity. Group those questions into topic clusters, then run them as tracked prompts.

    Here is a contrarian take that might raise eyebrows: I believe most SEO professionals over-invest in keyword volume data and under-invest in prompt diversity. Volume tells you what people typed into Google last month. Prompt mapping tells you what people will ask an AI model tomorrow. The two data sets overlap, but they are not the same, and the gap is growing as conversational search behaviour evolves. A study published by researchers at IIT Delhi and Princeton showed that GEO tactics like authoritative language and citation inclusion boosted visibility in generative engines by up to 40 percent, but only when the content matched the query intent closely.

    Once your clusters are running, compare visibility scores across personas. You will almost certainly find that your brand shows up well for one audience segment and poorly for another. That gap is your opportunity.

    Filling the Gaps Your Data Reveals

    After identifying weak spots, the content work becomes targeted. If decision makers see your brand but beginners do not, you likely lack introductory explainer content. If troubleshooters find you but practitioners do not, your how-to guides may need more technical depth. This is where first-hand experience matters. I have spent the past two years auditing LLM outputs for clients across manufacturing, SaaS and professional services, and the pattern repeats: brands that write for a single reader profile leave entire personas on the table.

    The Google helpful content guidelines stress demonstrating experience and expertise. That principle applies just as strongly to LLM visibility. Models trained partly on web content inherit the same quality signals. If your page reads like it was written by someone who has genuinely done the work, it stands a better chance of being surfaced in an AI-generated answer.

    Where This Is Heading

    Persona-based prompt mapping is not a one-off audit. As LLMs update their training data and refine how they select sources, the prompts that matter will shift too. I run my clusters on a rolling monthly cycle so that changes surface quickly. The brands that build this habit now will have a structural advantage as AI-driven search grows. Those still relying on a single averaged visibility score will keep wondering why their traffic from generative engines stays flat.

    Start small. Pick two or three personas, write ten prompts for each and track the results for a month. The data will speak for itself, and you will never go back to treating your audience as a single block again.

  • What Is AI Brand Monitoring and Why Does It Matter

    I have spent the past year watching how large language models talk about brands, products and services. What I have found is both fascinating and slightly unsettling. AI systems do not pull from a single, frozen database. They update, they re-crawl, and they change their answers without warning. If you are not keeping an eye on what they say about you, you are flying blind.

    Why AI Answers About Your Brand Keep Shifting

    Most people assume that once an AI gives a correct answer about their company, the job is done. That is wrong. Models get retrained. The web changes daily. Even a small tweak to a user’s prompt can produce a wildly different output. I have seen cases where a brand was cited accurately on Monday and dropped entirely by Thursday. Three weeks later it reappeared. This is not a bug; it is how these systems work.

    Think of it as quality assurance for external narratives. You already monitor your Google rankings, your social mentions and your review scores. AI brand monitoring is simply the next layer. According to Gartner’s overview of generative AI, these models are reshaping how consumers discover and evaluate products. If you ignore that channel, someone else will fill the gap with information you cannot control.

    The Real Cost of Incorrect AI Responses

    Here is where my experience diverges from the usual optimism. Many marketers treat AI visibility as a nice-to-have. I would argue it is closer to a reputational risk. I have personally encountered third-party websites carrying outdated or flat-out wrong product descriptions. When an LLM picks up that misinformation and serves it to a potential customer, the damage is real. The customer might buy the wrong product, receive a service that does not match expectations, or simply lose trust in the brand.

    Returns, complaints and negative word of mouth all follow. A BrightLocal consumer survey found that the majority of consumers trust online information as much as personal recommendations. When that information comes from an AI chatbot, the stakes are even higher because users often treat it as a single authoritative source rather than one result among many.

    How Weekly Monitoring Catches Problems Early

    Daily checks are available, but from what I have seen, a weekly cadence strikes the right balance between vigilance and practicality. Tools like LLM Optimize let you track how and when your brand appears in AI-generated answers over time. You get a historical view that shows patterns rather than snapshots.

    A weekly review lets your team spot factual errors before they spread. Maybe your website is missing key product specifications. Maybe a competitor comparison on an external site is misleading. Maybe your opening hours changed six months ago and nobody updated the third-party listing. These are exactly the sorts of gaps that LLMs surface, and fixing them improves not just your AI visibility but your overall online accuracy.

    I keep a simple checklist: run the monitoring report, flag any new errors or omissions, trace each issue back to its source, and fix it there. Most weeks there is nothing urgent. But when something does slip through, catching it in seven days rather than seven months can save a significant amount of revenue and reputation.

    A Contrarian View on Chasing AI Visibility

    I should be honest about something. Not every business needs to obsess over AI brand mentions right now. If your customers are not yet using ChatGPT, Gemini or Copilot to research your type of product, pouring resources into LLM optimisation may be premature. The people selling AI monitoring tools have an obvious incentive to tell you otherwise. Start by checking whether AI-generated answers actually appear for queries relevant to your industry. If they do not, focus your energy on the channels that already drive revenue and revisit AI monitoring in six months.

    That said, for any brand operating in a space where consumers do turn to AI for recommendations, comparisons or how-to guidance, monitoring is not optional. The information gap between what you publish and what AI tells users will only widen if left unchecked. A study from the Reuters Institute Digital News Report highlights how quickly AI-driven search is changing information discovery habits, and the trend shows no sign of slowing.

    Getting Started Without Overcomplicating It

    You do not need a massive budget or a dedicated team. Pick two or three prompts that a potential customer might type into an AI chatbot about your brand. Run them yourself across ChatGPT and at least one other model. Note what comes back. Is it accurate? Is your brand mentioned at all? Are competitors positioned more favourably?

    Do this once a week for a month. You will quickly see whether the answers are stable or volatile, correct or misleading. From there you can decide whether a paid monitoring tool is worth the investment or whether manual checks are enough for your scale. The important thing is to start looking, because what AI says about your brand is already shaping how people perceive you, whether you are watching or not.

  • How to Improve What AI Says About Your Brand

    AI assistants are quickly becoming the first place people turn when researching a product or service. If what ChatGPT, Gemini or Perplexity says about your brand is wrong, outdated or vague, you are losing trust before a prospect ever visits your website. The good news is that you can shape these answers, not by flipping a secret switch, but by improving the information environment that AI models pull from.

    Why AI Answers Matter More Than You Think

    When someone asks an AI assistant about your business, the model does not make things up from thin air. It synthesises information from web pages, documentation, reviews and third-party mentions. If those sources contain conflicting details, the AI will either pick one at random or hedge with a vague summary. Neither outcome helps your business.

    I have seen this first-hand with clients who updated their product line months ago but never revised the copy on their own website. The old specs kept appearing in AI-generated summaries because the model had no reason to prefer the new information over the old. Consistency across every touchpoint is not optional; it is the foundation of accurate AI representation.

    Make Your Website the Clearest Source of Truth

    The single most effective step is to turn your own site into the most authoritative, up-to-date reference for everything about your brand. That means reviewing every page for outdated claims, conflicting prices, retired features and broken links. If your About page says one thing and your FAQ says another, an AI model has no reliable way to decide which is correct.

    Start with the basics. Make sure product descriptions, service offerings and company details match across every page. Add supporting evidence wherever you can: methodology notes, data points, case studies and structured documentation. According to Google’s structured data guidelines, well-organised markup helps crawlers understand content faster and more accurately. The same principle applies to the large language models that now index your pages.

    One thing many guides skip is crawlability. If important pages sit behind JavaScript tabs, login walls or lazy-loading scripts that block bots, AI systems simply will not see the content. Check your robots.txt and make sure the pages you care about most are fully accessible.

    Align Third-Party Sources With Your Message

    Your website alone is not enough. AI models weigh third-party mentions heavily because independent sources signal credibility. If a well-known review site describes your service differently from how you describe it yourself, the AI may favour the external version.

    Audit what others say about you. Search for your brand on major directories, review platforms and industry publications. Where the information is wrong, reach out and request corrections. Where it is simply thin, consider contributing guest posts or providing updated media kits that journalists and bloggers can reference. Tools like LM Optimizer let you inspect which citations AI models are pulling for specific prompts, so you can see exactly where the gaps are.

    Here is where I hold a contrarian view: most marketers focus on creating new content to influence AI answers. I believe the higher-return activity is fixing existing content. A single contradictory page on an authoritative domain can override ten blog posts on your own site. Correcting that one page often does more than a month of fresh publishing.

    Use Prompt-Based Auditing to Track Progress

    You would not run a paid ad campaign without checking the metrics. The same logic applies here. Regularly query AI assistants with the prompts your customers are likely to use. Note what the model says, which sources it cites, and whether the answer has improved since your last check.

    In the video above, I walk through a practical example using an espresso machine brand. The company wanted AI assistants to recommend a specific brewing time. By ensuring their own site stated the same figure that appeared on reputable coffee review sites, the AI answer converged on the correct recommendation. It was not instant, but over a few weeks the results shifted noticeably.

    Document your findings in a simple spreadsheet: prompt, AI response, cited sources, date. Over time this gives you a clear picture of which changes moved the needle and which did not. Search Engine Land’s guide on influencing AI answers offers a useful framework for structuring this kind of audit.

    What This Means Going Forward

    AI-generated answers are only going to become more prominent. As models improve and more people rely on them for purchase decisions, the brands that maintain clean, consistent and well-sourced information will have a structural advantage. Those that ignore this shift risk being misrepresented in the very conversations that drive buying decisions.

    The work is not glamorous. It is auditing old pages, emailing webmasters and updating product specs. But it is the kind of steady, evidence-based effort that compounds over time. Start with your own site, expand to third-party sources and measure the results. The brands that treat AI accuracy as an ongoing discipline, rather than a one-off project, will be the ones that earn the most accurate and favourable mentions in the months ahead.

  • What Are AI Citations and Why They Can Be Wrong

    Most people assume that when an AI assistant provides a link, it must be real. After all, the tool searched the web and found a source, so the citation should be trustworthy. The truth is far less reassuring. AI citations are a mixture of genuine references and fabricated URLs, and the difference between the two is not always obvious.

    In this article, I explain what AI citations actually are, how AI assistants decide when to fetch outside sources, and why you should verify every link before trusting it.

    How AI Assistants Choose Between Memory and Search

    An AI assistant can respond in two fundamentally different ways. The first is answering from its training data, the vast body of text it was exposed to during training. When you ask something general, such as how to edit a PDF, the model often has enough stored knowledge to produce a useful answer without looking anything up. The second approach involves a retrieval step. The model searches the web or pulls documents from an index, then writes an answer grounded in those documents.

    I have tested this myself many times. A question like “how do I edit a PDF” typically gets answered from memory. But a time-sensitive question like “what is the weather in Zurich today” forces the model to search, because its training data cannot possibly contain today’s forecast. The decision between these two paths is not random. It depends on whether the model judges the query to require fresh or external information.

    What surprises many users is that the model does not always get this judgment right. Sometimes it answers from memory when a search would have been more accurate. Other times it searches unnecessarily. This inconsistency is part of why AI-generated citations can be unreliable, and it is something most providers are still working to improve.

    What AI Citations Actually Are

    Citations in AI assistants appear as clickable links or small reference boxes alongside the generated text. In tools like ChatGPT, they often show up as numbered grey boxes that you can click to visit the source. When the assistant performs a retrieval step, these citations point to the web pages or documents it consulted. They serve a similar purpose to footnotes in academic writing: they tell you where the information supposedly came from.

    However, there is a critical distinction between citations produced after a genuine search and links that the model generates from memory. According to OpenAI’s documentation on browsing, ChatGPT uses a browsing tool to fetch real-time information. When the browsing tool is active, the citations are grounded in actual retrieved pages. When it is not, any URLs in the response come from the model’s training data, and those links may no longer exist or may never have existed at all.

    This is the core problem. The visual presentation of a citation looks identical whether the link is real or invented. There is no label that says “this one was actually retrieved” versus “this one was generated from patterns in the training data.”

    Why AI Citations Are Often Wrong

    Here is where my view parts from the optimists. I believe the citation problem in AI assistants is more serious than most people acknowledge. The term for invented references is “hallucination,” and it affects URLs just as much as it affects factual claims. A model might generate a URL that looks plausible, follows the correct domain structure, and even includes a realistic page slug, yet leads to a 404 error when you actually click it.

    I have seen this repeatedly in my own server logs. Hallucinated URLs from AI tools generate real HTTP requests that hit real websites and return 404 responses. If you run a website and notice a sudden increase in 404 errors with oddly specific paths, AI-generated links could be the cause. A study published on arXiv confirmed that large language models frequently produce non-existent references, especially when generating academic citations.

    The risk is not just inconvenience. If you are researching a medical question, a legal issue, or a financial decision, a hallucinated citation can lend false authority to bad information. The link looks credible. The surrounding text reads confidently. But the source does not exist.

    How to Verify AI Citations Before Trusting Them

    The practical solution is straightforward: open every citation in a new tab before you rely on it. If the page loads and contains the information the AI referenced, you can have some confidence in that particular claim. If you get a 404 or the page content does not match the AI’s summary, discard it.

    Beyond manual checking, look for signals that a retrieval step actually happened. In Google’s Gemini, for instance, you can sometimes see a “search” indicator that confirms the model queried the web. If no such indicator is present, treat any links with extra caution. I also recommend cross-referencing important claims with a traditional search engine. It takes an extra minute, but it can save you from citing a source that does not exist.

    Some users assume that paid tiers or newer models are immune to this problem. They are not. While retrieval-augmented generation has improved citation accuracy, no current system guarantees that every link is valid. Trusting AI citations blindly is a habit worth breaking now, before it costs you credibility.

    Where AI Citations Are Heading

    The next generation of AI tools will likely separate retrieved citations from generated ones more clearly. I expect we will see explicit labelling, confidence scores, and perhaps even automated link-checking built into the response pipeline. Some early experiments with attributed question answering at Google Research point in this direction.

    Until those safeguards arrive, the responsibility sits with the user. Every AI citation is a claim, not a fact. Treat it accordingly, verify before you share, and remember that a confident-sounding answer with a broken link is worse than no answer at all.

  • What Is Sentiment in AI Answers and Why Does It Matter

    When you ask ChatGPT or Google’s AI Overview a question, the words it chooses carry emotional weight. That emotional direction, whether positive, negative or neutral, is what we call sentiment. Most people never think about it, but sentiment in AI-generated answers quietly shapes how users feel about brands, products and even medical advice.

    What Sentiment Actually Means in AI Responses

    Sentiment is the emotional tone embedded in language. In a traditional search result, you click through to a webpage and form your own opinion from the content you read. With AI answers, the model has already done that work for you. It has synthesised sources, picked specific words and delivered a response that leans in a particular emotional direction.

    This matters because the AI’s word choices influence perception at scale. If an AI assistant describes a brand as “reliable and well-regarded,” that is a positive sentiment signal. If it says a product “has faced criticism for quality issues,” that is negative. The user did not visit any website. They simply absorbed the AI’s framing as fact.

    I have spent the past year building and refining sentiment tracking inside our LLM Optimizer tool, and the patterns we see are striking. The same brand can shift from mostly negative AI mentions to positive ones over a matter of weeks, depending on what new content the models ingest.

    Where Sentiment Analysis Gets It Wrong

    Here is my contrarian take: most off-the-shelf sentiment analysis is not good enough for AI answer monitoring. Standard NLP classifiers were trained on product reviews and social media posts. They struggle badly with the nuanced, synthesised language that large language models produce.

    We hit this problem early on. Take the query “best protein for weight loss.” The word “loss” is typically flagged as negative by basic sentiment models. But in a health and fitness context, weight loss is the desired outcome. It is entirely positive. We saw the same issue with pharmaceutical queries where terms like “drug,” “side effects” and “withdrawal” kept triggering false negatives even when the AI answer was recommending a product favourably.

    Sarcasm is another blind spot. If an AI response says something like “sure, if you enjoy waiting three weeks for delivery, this is the brand for you,” a naive classifier might score that as positive because of the word “enjoy.” According to research from Stanford’s NLP group, sarcasm detection remains one of the hardest unsolved problems in sentiment analysis, and AI-generated text adds another layer of complexity.

    Domain-specific language trips things up constantly. You need classifiers that understand industry context, not just generic positive and negative word lists.

    Why the Same Prompt Can Produce Different Sentiment

    One thing that surprises people is how inconsistent AI sentiment can be for identical queries. I have tested the same prompt on consecutive days and received answers with noticeably different tones. Sometimes the response is enthusiastic and recommending. Other times it is cautious and hedging.

    There are a few reasons for this. Large language models have a degree of randomness built into their generation process through temperature settings. Personalisation also plays a role. If the model has context about you from previous interactions, it may adjust its tone accordingly. And as models get updated with fresh training data, the underlying sentiment towards a topic can shift entirely.

    This variability is exactly why point-in-time sentiment checks are not enough. You need to track sentiment over time to see real trends rather than reacting to a single snapshot.

    How I Track Sentiment for Brands in Practice

    In our tool, we monitor sentiment across AI platforms on an ongoing basis. For each brand we track, the system logs whether individual AI responses are positive, neutral or negative. Over weeks and months, this builds into a trend line that tells a clear story.

    For one client in the nutrition space, we watched their sentiment score climb from mostly red (negative) to predominantly green (positive) over about six weeks. The shift correlated directly with a content strategy we had implemented: publishing more expert-authored articles, earning mentions on authoritative health sites and ensuring consistent brand messaging across platforms that AI models tend to reference.

    The breakdown at the prompt level is just as useful. You can see exactly which queries trigger negative sentiment and work backwards to understand why. Often it comes down to a single problematic source that the AI keeps citing, or outdated information that still lingers in the model’s training data.

    What This Means for Your Brand Going Forward

    AI answers are becoming a primary information channel for millions of users. The sentiment those answers carry about your brand is not something you can afford to ignore. Unlike traditional search where you control your own page’s messaging, AI responses are generated from a mix of sources you may not even know about.

    My recommendation is simple. Start monitoring how AI models talk about you. Look beyond just whether you are mentioned and examine the emotional tone of those mentions. Build a content strategy that feeds positive, accurate, expert-backed information into the ecosystem that these models draw from.

    Sentiment in AI answers is still a young field, and the tools for measuring it are improving rapidly. The brands that pay attention to this now will have a significant advantage as AI-generated answers become the default way people discover and evaluate products and services. The question is not whether AI sentiment matters. It is whether you are measuring it yet.

  • What Is Agentic Traffic in SEO?

    AI agents are now browsing websites on behalf of users. They click buttons, fill in forms, and pull content back to chatbots like ChatGPT and Perplexity. This new wave of automated visits is called agentic traffic, and most site owners have no idea it is happening.

    I have spent the last few months tracking how these AI agents interact with client websites. What I found surprised me. Many sites are accidentally blocking or confusing the very bots that could send them qualified visitors. In this article I will explain what agentic traffic is, why it matters, and how you can optimise for it.

    How Agentic Traffic Differs from Traditional Crawlers

    Traditional crawlers like Googlebot visit your pages, read the HTML, and index the content. They do not interact with the page. Agentic traffic is different. AI agents actively browse your site. They render JavaScript, attempt to click elements, and even try to submit forms.

    Think of it this way. Googlebot is a reader. An AI agent is a user. It behaves more like a real person would, except it makes decisions based on page structure rather than visual cues. If your site has confusing overlays, misplaced buttons, or poorly labelled form fields, the agent will struggle. And when it struggles, it leaves.

    Tools like Search Engine Land’s coverage of agentic SEO confirm that this shift is already well under way. The agents visiting your site today come from GPT, Perplexity, and a growing list of AI-powered browsers.

    Why Most Websites Fail the Agentic Test

    Here is a stat that caught my attention. When I audited around a hundred websites for agentic compatibility, roughly 95% had issues with form submissions. The AI agents could not fill in and submit forms properly. That is not a small problem if your business depends on lead generation.

    The most common issues I see are:

    • CDN or firewall rules blocking AI user agents entirely
    • Heavy client-side rendering that hides content from bots
    • Overlays and pop-ups that confuse agent navigation
    • Form fields without clear labels or accessible markup

    On one client site, the AI agents could only see 49% of the page content. The rest was locked behind JavaScript rendering that the bots could not process. Half the page was invisible. That is a huge missed opportunity, and the client had no idea until we measured it.

    Tracking and Measuring Agentic Traffic

    You cannot optimise what you do not measure. The first step is to identify which AI agents visit your site and whether they succeed. I track metrics like hit counts, success rates per agent, and the percentage of content visible to non-JavaScript renderers.

    For example, on one project I monitored 2,600 agentic hits over four weeks. ChatGPT’s agent had a 73% success rate. That means 27% of the time it failed to retrieve what it needed. By drilling into the failing URLs, I found specific pages where the CDN was blocking requests. A simple configuration change fixed it overnight.

    Google’s own documentation on crawlers and user agents is a good starting point for understanding bot identification. But keep in mind that agentic bots behave differently from traditional search crawlers, so your monitoring needs to go further.

    How to Optimise Your Site for AI Agents

    My honest take: most of the advice floating around about AI optimisation focuses on prompt engineering and content formatting. That matters, but it misses the bigger problem. If the agent cannot even access your page or interact with it, your content is irrelevant. Fix the plumbing before you worry about the words.

    Here is what I recommend based on my own testing:

    • Audit your server logs for AI user agents. Check if they get 200 responses or errors.
    • Test your pages with JavaScript disabled. If critical content disappears, you have a rendering problem.
    • Remove or defer overlays that appear before the main content loads.
    • Ensure all form fields have proper labels and that forms work without JavaScript where possible.
    • Review your CDN and WAF rules. Many default configurations block legitimate AI agents.

    The Search Engine Journal technical SEO guide covers many of these fundamentals, though their advice is geared towards traditional bots. The principles of clean structure, accessible markup, and fast server responses apply even more to agentic visitors.

    Where Agentic Traffic Is Heading

    We are still in the early days. Right now, AI agents mostly retrieve and summarise content. But the next generation of agent browsers will do much more. They will complete purchases, book appointments, and fill in applications on behalf of users. If your site is not ready for that, you will lose conversions to competitors whose sites are.

    I expect agentic traffic to become a standard metric in SEO reporting within the next year. The sites that start tracking and optimising for it now will have a clear advantage. Those that ignore it will wonder why their traffic numbers look fine but their conversions keep dropping.

    The shift from passive crawling to active browsing changes what it means to have a well-optimised website. Start measuring your agentic traffic today. Find the gaps. Fix the access issues. Your future visitors, both human and artificial, will thank you.

  • What Is GEO Measurement and Why Is It So Hard?

    Most marketers talk about Generative Engine Optimisation as though it were just SEO with a new hat. But when you try to measure GEO performance, you quickly discover a problem: the numbers shift every time you look at them. AI responses are non-deterministic, region-dependent and shaped by personalisation. That makes reliable measurement genuinely difficult.

    I have spent the past year tracking how brands appear inside AI-generated answers. In that time I have watched the same prompt return different brand mentions on Monday and Tuesday, from the same model, in the same location. If you are investing in GEO, you need to understand exactly where measurement stands today and where the gaps still sit.

    Why AI Responses Keep Changing

    Large language models are non-deterministic by design. Run the same prompt twice and you can get different wording, different sources cited and different brands mentioned. This is not a bug. It is how probabilistic text generation works. Temperature settings, model updates and cached context all influence output.

    For traditional search, you could pull rank-tracking data and trust it for a week. With GEO, a single data point tells you very little. You need repeated samples across time, regions and user states to build a picture you can act on. Google’s own documentation on AI principles acknowledges that model behaviour varies with context. That variation flows straight into your measurement challenge.

    Region and Personalisation Add Noise

    Location matters more than most people realise. A prompt about “best project management tools” will surface different brands in Spain than in the United States. The model draws on regional training data, local popularity signals and language cues. If your measurement tool does not simulate queries from the correct region, your data is misleading from the start.

    Personalisation adds another layer. When a user is logged into ChatGPT or Gemini, their conversation history and preferences shape the response. That means two users asking the same question can see completely different brand recommendations. Measuring “your” visibility in AI answers is therefore always an approximation. The best you can do is control for region, strip out personalisation where possible and sample frequently.

    Here is my contrarian take: most GEO tools overstate their accuracy because they run a handful of prompts from a single location and call it visibility data. That is not measurement. That is a screenshot. Real measurement requires hundreds of prompt variations, multiple regions and longitudinal tracking. If a vendor cannot explain their sampling methodology, treat their numbers with scepticism.

    The Brand Detection Problem

    One issue that caught me off guard early on was brand detection. It sounds simple: scan the AI response for your brand name and count mentions. But many brands share names with common words. Think “Apple” the fruit versus Apple the company, or “Teams” the Microsoft product versus teams of people.

    When I first tested detection scripts against real AI outputs, false positives were everywhere. A response about workplace collaboration would mention “teams” five times without once referring to Microsoft Teams. You need entity disambiguation, not string matching. Tools like those reviewed by Search Engine Journal are starting to address this, but it remains a weak spot across the industry.

    The solution involves building custom detection layers that understand context. You look at surrounding words, the prompt category and the typical entities that appear together. It is slow, manual work. But without it, your visibility score is fiction.

    What You Can Measure Today

    Despite these challenges, useful measurement is possible. Here is what works reasonably well right now:

    • Brand mention frequency across a large sample of prompts related to your category.
    • Sentiment analysis of how AI models describe your brand when they do mention it.
    • Share of voice compared to competitors within specific prompt clusters.
    • Regional differences in brand visibility across key markets.

    These metrics give you directional insight. They tell you whether your GEO efforts are moving the needle. They do not yet give you the precision of a Google Search Console click report. That gap is real, and pretending otherwise helps nobody.

    Tools built specifically for GEO tracking, such as those explored in Moz’s research on AI search, are improving fast. Sampling methods are getting smarter. Regional simulation is more accurate. Brand detection is catching up. But we are still in the early innings.

    Where GEO Measurement Goes From Here

    The trajectory is clear. As more organisations invest in GEO, the demand for reliable measurement will push tooling forward. I expect three shifts over the next twelve months. First, API-level access to AI platforms will allow real-time sampling at scale. Second, standardised metrics will emerge so that brands can benchmark across tools. Third, personalisation modelling will let you estimate visibility for different audience segments, not just a generic “average user.”

    My experience working with early GEO data for clients across multiple sectors has taught me one thing above all: patience matters more than precision right now. Track trends, not snapshots. Compare quarters, not days. Build your measurement practice today so that when the tools catch up, you already have baseline data to measure against.

    GEO measurement is messy, imperfect and improving fast. The brands that accept that reality and invest anyway will be the ones with a head start when the data finally sharpens.

  • What Are Grounding Prompts in LLMs? RAG Explained

    Large Language Models are extremely good at sounding right. That’s also their biggest problem.

    I work daily with teams using LLMs in production environments where accuracy matters: SEO, GEO, analytics, product documentation, and decision support. One pattern shows up over and over again: answers that look correct, feel authoritative, and still contain information that simply isn’t true.

    Most commonly, this shows up as invented URLs.

    This is not anecdotal. It is a direct consequence of how LLMs work.

    I took inspiration from Martina Raissle’s Linkedin post showing a clear example of this.

    Why LLMs invent URLs so convincingly

    When an LLM prints a URL directly in the response, it is usually not retrieving it. The model is predicting what a plausible URL should look like based on patterns learned during training.

    It is not checking:

    • whether the page exists
    • whether the domain structure is correct
    • whether the content behind the URL is real

    URLs are especially vulnerable because they follow predictable patterns. To a language model,
    /blog/2024/llm-grounding.html looks just as valid as a real page.

    This behavior is a well-documented form of hallucination in LLMs. The model is doing exactly what it was trained to do: generate likely text, not verify facts.

    Generated answers vs grounded answers

    In modern LLM systems, there is a fundamental distinction that often gets blurred in practice.

    Model-only generation

    • Fluent and confident
    • Can invent facts, numbers, and URLs
    • No external verification step

    RAG-backed generation (Retrieval-Augmented Generation)

    • Retrieval happens first
    • The model is constrained to retrieved documents
    • Claims and URLs come from real sources

    When you see answers accompanied by a citation block or source panel, that is typically a signal that retrieval has occurred. The citation is not decorative. It is metadata from the retrieval step.

    When you don’t see this, you are usually looking at pure model output.

    Why “grounding-style” prompts are not enough

    This is where many teams get caught off guard.

    Prompts such as:

    • “Only answer using factual sources”
    • “Make sure the answer is grounded”
    • “Cite your sources”

    sound reasonable, but they do not guarantee that retrieval is enforced.

    The reason is simple:

    • Prompting is an instruction
    • Grounding is an architectural constraint

    Unless the system explicitly:

    • runs retrieval before generation, and
    • blocks generation when no documents are returned,

    the model can — and will — fall back to its internal knowledge and fill in the gaps.

    In short:

    Prompting ≠ enforcement
    Instruction ≠ system guarantee

    Having said that, what I’ve found that often works is the below prompt:

    You must answer using retrieved sources only.
    
    Rules:
    - Before generating an answer, retrieve relevant documents.
    - If no relevant sources are found, respond with:
      "I don’t have sufficient sourced information to answer this."
    - Do not use prior knowledge or training data.
    - Do not infer, guess, or complete missing information.
    - Do not generate URLs unless they appear verbatim in the retrieved sources.
    - All factual claims must be supported by a cited source.
    
    Output format:
    - Answer
    - Sources (with exact URLs or document identifiers)
    
    User question:
    {question}

    The prompt works because it:

    • allows refusal
    • removes pressure to “be helpful at all costs”
    • replaces preferences with constraints
    • aligns model behavior with RAG logic

    Why this matters in real-world use cases

    In many professional workflows, grounding-style prompts are treated as a trust boundary.

    The expectation is:

    • no sources → no answer

    What often happens instead:

    • no sources → confident guess

    That is not a user mistake. It is a system design issue.

    In environments where accuracy matters — SEO, legal, medical, finance, analytics, or executive reporting — a system that fails open is more dangerous than one that refuses to answer.

    How reliable RAG systems handle this

    Well-designed RAG systems tend to share the same principles:

    • Retrieval is mandatory, not optional
    • Answers are clearly labeled as retrieved vs model-derived
    • Generation is blocked if no documents are found
    • URLs are only surfaced if they come from retrieval metadata

    This separation between retrieval and generation is what turns LLMs from writing assistants into reliable knowledge systems.

    A practical heuristic that works today

    Until grounding is enforced everywhere, one practical rule helps reduce risk:

    • Raw URLs written inline → treat with skepticism
    • URLs shown in citation / source blocks → far more reliable

    This is not theoretical. In practice, it is one of the strongest signals available today for identifying hallucinated references.

    Based on hands-on use in production systems:

    • This behavior is expected given how LLMs work today
    • Prompts alone cannot guarantee grounding
    • Only enforced retrieval plus constrained generation can

    If correctness matters in your workflow, grounding is not optional. It is foundational.


    Sources & further reading


  • How to Make Your CMS Crawl-Ready for LLMs (2025)

    At adaptTo() 2025 in Berlin, I and Sinem Dere shared our experience helping large websites prepare for the next frontier of discoverability, how content is interpreted and surfaced by Large Language Models (LLMs) and AI-driven search systems.

    For the past few years, I’ve been exploring how SEO principles evolve when search becomes conversational, and how we can make our content retrievable, trusted, and cited by AI systems. This new discipline is what I call Generative Engine Optimization (GEO), an evolution of SEO for the LLM era.

    From SEO to GEO: Visibility in the Age of AI

    Traditional SEO was about ranking high on Google. In the world of LLMs, visibility means something new: being mentioned, cited, or retrieved as a trusted source inside AI-generated answers.

    People don’t always visit websites anymore; they ask models like ChatGPT, Gemini, or Perplexity. These systems synthesize content from what they can crawl, retrieve, and trust. If your content isn’t accessible or comprehensible to these systems, you effectively don’t exist in AI search.

    That’s what GEO focuses on, ensuring content is structured and written in a way that LLMs can easily understand and reference.

    The fundamentals of SEO haven’t disappeared. Good content, fast load times, and logical structure still matter. What’s changed is the end-destination of that content: from the SERP to the model’s answer.

    How I Got Into GEO

    My journey into GEO started with a simple observation: when optimizing for traditional SEO, fixing site speed, improving markup, cleaning up structure, some websites started getting unexpected traffic from AI assistants.

    Users were discovering brands through AI conversations, not just search results. That insight sparked a process of testing, measuring, and validating how LLMs retrieve and use web data.

    I began analyzing crawl logs, comparing LLM bot activity with traditional crawlers, and building hypotheses around what makes a page “LLM-visible.”

    Over time, patterns emerged:

    • Sites that were well-structured, semantic, and accessible appeared more often in AI answers.
    • Pages relying heavily on JavaScript rendering were often invisible.
    • Google ranking and LLM citations were correlated, what ranks well tends to get retrieved.

    That realization changed how I approach SEO entirely.

    The Role of Retrieval and RAG

    One of the key technical enablers behind this new reality is Retrieval-Augmented Generation (RAG), the mechanism LLMs use to search the web and include live data in their answers.

    When someone asks “What’s the best PDF editor?” or “How do I fill out a contract template?”, LLMs go out, find relevant URLs, retrieve snippets, and integrate them into their responses, often citing the sources.

    If your website isn’t retrievable, it isn’t citable. RAG has made “ranking” less about links and more about being reachable and comprehensible by AI.

    To succeed in GEO, your content must be:

    1. Crawlable – accessible to AI crawlers and free of JS-only content.
    2. Retrievable – indexed properly, with clear signals of topic and authority.
    3. Usable – written in a way that models can extract context and facts without ambiguity.

    This combination of accessibility and clarity is what I refer to as being Crawl-Ready.

    Making Your Site Crawl-Ready for AI Systems

    Here are some best practices I’ve learned through experiments and log analysis:

    • Open the Door to AI Crawlers: Allow known LLM user agents (like GPTBot or PerplexityBot) in your robots.txt and verify via server logs that they’re actually visiting.
    • Avoid JS-Only Rendering: Use static rendering or hybrid SSR for key content areas.
    • Structure Content Semantically: Use logical headings, clear sections, and short paragraphs. Add FAQ or summary sections where relevant, these often get cited directly.
    • Ensure Internal Linking and Pagination: Avoid infinite scroll or AJAX-loaded content; make sure crawlers can access all pages.
    • Monitor Crawl Logs: Watching how bots interact with your site reveals what’s being retrieved and what’s missed.

    These are the same technical principles that made sites perform well in SEO, just with a new purpose: helping LLMs read and reuse your content accurately.

    Lessons from Testing and Real-World Data

    The most important lesson I’ve learned in GEO is this: don’t trust assumptions, test everything.

    Over the past year, I’ve seen dozens of claims about “new SEO for AI” tactics, like adding llms.txt or changing meta tags. When I tested them across large domains, most showed no measurable effect.

    In one 30-day analysis of enterprise-level sites, major LLM crawlers didn’t even access llms.txt once. It’s a good reminder that evidence beats hype, the same principle SEO professionals have always followed.

    Why SEO Still Matters (Even More Now)

    There’s a misconception that SEO is dying because of AI. In reality, it’s becoming the foundation for AI visibility.

    My own data shows that if a website ranks within the top two pages of Google, it’s significantly more likely to be cited by LLMs. Models rely on high-authority, well-ranked pages as their reference layer.

    So instead of “SEO vs AI,” think “SEO for AI.” Your existing best practices, clean structure, good content, trusted links, are what make your brand retrievable, citeable, and trustworthy to AI systems.

    The Future of GEO

    Generative Engine Optimization isn’t a replacement for SEO, it’s its logical next step. It blends technical SEO, content strategy, and data analysis with a new goal: to make content usable by machines, not just readable by humans.

    Over the next few years, I expect:

    • Crawl-to-Answer pipelines will become measurable KPIs.
    • LLM visibility will be tracked just like organic rankings.
    • “Crawl-Ready” will become as common a phrase as “Mobile-Friendly.”

    We’re entering a time when search isn’t just typed, it’s generated. And that means every brand must adapt their content to be understood by both humans and models.

    Closing Thoughts

    The talk at adaptTo() was one of my favorite moments this year, not because of the slides or demo, but because it showed how far we’ve come as an industry.

    SEO has always evolved with technology. This is just the next stage, where visibility depends not only on what you publish, but how machines interpret it.

    “Where do you hide a dead body?
    On the second page of Google.”

    These days, maybe also in a dataset no LLM retrieves.

    References

  • Do LLMs Crawl Markdown (.md) Files? Data Analysis

    As large language models (LLMs) such as ChatGPT, Claude and other generative AI systems reshape how people discover and consume information, a recurring question for digital marketers and content strategists is whether these models work directly with Markdown files (.md).

    Markdown is widely used by developers and documentation teams as a lightweight, human‑readable authoring format. But does it play a role in how LLMs crawl and consume web content?

    Recently, I carried out a targeted log‑file analysis to better understand how (or if) .md files are surfaced to LLM crawlers.

    Summary of Findings:

    • LLMs ignore Markdown files — log analysis showed no evidence that GPTBot, ClaudeBot, or similar crawlers request or prioritise .md content, even when listed in llms.txt.
    • HTML remains the standard — structured, cached HTML is consistently the most reliable and supported format for both search engines and LLMs.
    • No ROI for .md delivery — maintaining Markdown alongside HTML adds overhead without proven gains in visibility, brand mentions, or indexing.
    • Use Markdown internally only — it remains valuable for documentation and workflows, but should not be treated as a delivery layer for AI optimisation.
    • Optimise for what works today — focus efforts on clean, semantic HTML, caching strategies, and accessibility rather than speculative standards like llms.txt.

    Why are we discussing markdown (.md) in the first place?

    Because within the documentation of the llms.txt files, it mentions that you should use markdown

    …We furthermore propose that pages on websites that have information that might be useful for LLMs to read provide a clean markdown version of those pages at the same URL as the original page, but with .md appended. (URLs without file names should append index.html.md instead.)…

    Do LLMs Use .md Files?

    LLMs are not prioritizing .md files — log analysis showed no requests from GPTBot, ClaudeBot, or other AI crawlers, even when .md files were listed in llms.txt.
    Authoritative domains are no exception — two sites with DA > 90 and millions of daily visits had zero LLM traffic over a 24-hour window.
    HTML remains the standard — well-structured, cached HTML is still the most reliable format for both search engines and LLMs.
    No clear marketing ROI for .md — maintaining .md alongside HTML adds overhead, with no proven visibility or brand-mention benefits.

    Recommendation: Focus optimization on HTML outputs; treat .md as an internal content format, not as a delivery layer for LLMs.


    Scope of the Analysis

    To keep the study practical but still representative, the parameters were defined as follows:

    • Duration: 24 hours of raw CDN log data
    • Sample: Two high‑traffic websites with millions of daily visits
    • Domain authority: Both sites carried DA > 90
    • Configuration: Each site exposed llms.txt files listing selected .md resources for potential crawling
    • New content: At least one fresh page on each site was published within the analysis window

    One limitation is the relatively short duration. A full week’s logs would have meant processing well over 150 GB of data, which was not technically feasible for this initial phase.


    Analysis

    • No identifiable LLM crawler traffic: Across both sites, no access requests were logged from recognised LLM bots such as GPTBot, ClaudeBot, or PerplexityBot.
    • No .md file retrieval: Although .md files had been explicitly referenced within the llms.txt directive, there was no evidence of these being fetched by any bot claiming to represent an LLM provider.
    • What about other bots? I found random scrapping bots that were hitting the files, but no major / significant traditional search engine were found.
    • What are you trying to accomplish with having an .md file? The biggest question really is, why would you use .md files in the first place? what would be your end goal:
    • Increased context for LLMs: Some GEO specialsits suggest adding more textual information in the .md files that would otherwise not be relevant to the end user. In my opinion this is not wise as it can easily be considered a form of manipulation.

    Why would you even consider using an .md file?

    The question really comes down to: What is your goal, why do you want to use .md files in the first place? What are you trying to fix / resolve for this?

    1. Improve crawling rates:

    Imagine you have a website with millions of new UGC (User Generated Content) assets being published each day. Search engines often have issues catching up with the indexation. using .MD files could be a solution to this. However, I don’t believe that this would be a wise way of proceeding: Why?

    • You are actually generating an additional asset for LLMs to crawl and not just the original URL
    • .MD do not have a html head, were you could create a relationship such a with rel-canonical to the main document.
      • This would therefore need to be done in the HTTP PUT request
    • There are other formats suchas plain .HTML that could be used just for the bot i.e.

    Process:

    1. user agent that includes “bot” makes a request for a given page
    2. The system delivers a pre-rendered page:
      What is a pre-rendered page? It is a page that the JS has fully been rendered in the browser and that version is cached. Therefore ZERO JS is needed to visualise the content on the page. The main limitation is that certain JS interactions would not work but the content is visible.
    3. This file has exactly the same URL so that it is not an additional asset for the bot to render
    4. Cache the .html version of the website using CDN to improve speed
    StepsUserBot
    user-agent detectionUser gets the standard version of the page with JSGets a clean optimized cached version with no JS but all client side JS already rendered.
    The Cache file would then be on a fast CDN and syndicated

    I’ve done this exact implementation for this type of website and we were able to increase the indexing rate from 30% to 93% in a 18 month window. Why? Because it is estimated that for every 1 JS rendered page you could crawl 100 none JS with the same computing resources. Therefore, it is logical to have a version like this

    But is this not cloaking? No, we deliver EXACTLY the same textual version that the browser would see once JS is rendered.

    Definition: Cloaking is when you intentionally try to manipulate search engines by hiding / adding content to a version that is only for them.

    2. Heavy JS rendered content

    Imagine you have a website that hosts PDF files, for the end user you visualise the PDF in a JS heavy viewer. Search engines are unable to see the content of that PDF and you can’t expose the PDF as user would steel the files. How do you expose it to search engines / LLMs?

    In this context using an .md file would make logical sense but my main concern is how do LLMs understand the relationship between the two and when they reference in the prompt. would they ever reference a .md file? I think not.

    So again, the cleaned out .html file would win here.

    • Only one url for the same asset which helps crawling budgets and indexation

    Questions I Am Frequently Asked

    My site is rendered entirely client‑side with JavaScript. Should I consider publishing content in Markdown to make it accessible?

    A: No. A better approach is to ensure you serve a pre‑rendered HTML version and cache it properly. This way, the content remains crawlable to both LLM bots and traditional search crawlers.

    Are there cases where Markdown should be used as a direct LLM optimisation layer?

    Not really. Markdown is useful internally—for documentation or content maintenance—but optimisation for LLMs should focus on clean, structured HTML output, not Markdown.

    Do .md files improve visibility in large language models (LLMs) compared to standard HTML pages?

    No. Current evidence shows LLMs do not prioritize .md files over HTML. Well-structured HTML remains the most reliable format for visibility.

    If my site is client-side rendered (JS heavy), would exposing .md files help LLMs or search engines access my content more easily?

    No. A better solution is to use an HTML renderer and cache the results. This ensures both search engines and LLMs can properly access your content.

    Are there proven cases where .md files increased brand mentions or visibility in generative AI search results?

    Not yet. While some proposals suggest benefits, no third-party studies or log data confirm that .md exposure improves brand visibility in LLM outputs.

    How much additional maintenance effort would it take to manage .md versions of my pages alongside existing HTML—and is the ROI justified?

    Maintaining parallel .md and HTML versions increases workload and risk of outdated content. At this stage, the ROI is unproven.

    Should I list .md files in llms.txt to signal them to AI crawlers, or is it better to optimize the HTML output we already have?

    Optimize HTML. Listing .md files in llms.txt is experimental and currently unused by major LLM bots. HTML optimization offers a clearer path to results. More info here


    Third‑Party Context

    Independent sources in both search and AI research echo these observations:

    • Yoast SEO notes that although llms.txt was proposed as a kind of robots.txt for AI crawlersno major LLM provider currently supports it. GPTBot, Claude, or Google’s AI products do not read Markdown or llms.txt as part of their active crawling routines (Yoast, 2024).
    • Daydream’s analysis warns that managing .md‑based feeds can introduce risks of data divergence—where Markdown goes out of sync with published HTML. This could actually harm brand accuracy if models ingest outdated content (Daydream Library, 2024).
    • Academic work (HtmlRAG study, arXiv, 2024) tested retrieval‑augmented generation (RAG) pipelines and found that HTML retained semantic structure—headings, metadata, table layouts—that plain text or Markdown often strips away. These structural signals improved contextual knowledge retention and retrieval performance, supporting the argument that HTML delivers more value to LLM ingestion workflows.

    Collectively, these insights align with the practical results of the log‑file study.


    Recommendations based on my research, experience and observation

    • Do not publish directly in Markdown for LLM visibility. Keep Markdown for internal versioning and workflows.
    • Focus on HTML as the public output layer. Ensure semantic tags are used and the pages are properly cached.
    • Do not rely on llms.txt today. It is an experimental idea with very limited adoption.
    • Prioritise accessibility and clarity of HTML outputs over trying to second‑guess speculative AI standards.

    Conclusion

    This analysis, while narrow in scope, makes one point clear: LLMs are not actively crawling or requesting Markdown files, even when explicitly listed in llms.txt. Instead, industry evidence shows that AI ingestion pipelines focus on HTML‑rendered content, which provides richer context and stronger retrieval signals.

    For now, organisations should maintain emphasis on accessible, semantically‑structured HTML, coupled with robust caching strategies. Markdown remains valuable as an internal content authoring format, but it is not a shortcut to visibility within LLM ecosystems.


  • LLMs.txt: Why AI Crawlers Ignore It (2025 Audit)

    Updated: June 2026 · A new article has been published on the subject about LLMS.txt and extends my earlier write-up, llms.txt

    This analysis aims to review the usage of LLMs.txt files in the context of LLMs.

    How was the analysis performed: I audited 30 days of raw CDN logs for 1,000 Adobe Experience Manager domains to see who actually requests the file. The results were, frankly, brutal.

    Findings of the LLMs.txt audit:

    • LLM-specific bots stayed away. No GPTBot, ClaudeBot, PerplexityBot, or similar were seen at all.
    • Google still probes everything. Its desktop crawler accounted for 95% of all hits.
    • Bing is curious but inconsistent. Only seven requests—concentrated on one domain (out of one-thousand)
    • OpenAI’s search bot was minimal. Ten calls from OpenAIBotSearch. GPTBot itself was absent.
    • SEO tools inflated the logs. Tools like Semrush Mobile and SiteAudit caused many hits, unrelated to LLMs.
    RankUser-agentShare of all llms.txt hits
    1GoogleBotDesktop94.9%
    2OpenAIBotSearch1.1%
    3ScanPire0.8%
    4BingBot0.8%
    Eight other bots<1% each

    Why Aren’t LLMs going to the llms.txt file?

    1. The spec is still unofficial. No LLM lab has committed to honoring it yet.
    2. Most training uses pre-built datasets. Like Common Crawl or books, not live fetches.
    3. Robots.txt already covers them. Major labs honor standard tokens like GPTBot and ClaudeBot.
    4. It’s not cost-effective. Probing llms.txt on every domain wastes crawl budget.

    What are my recommendations for site owners in relation to llms.txt

    This really depends on the difficulty of implementing the llms.txt file, if you feel that it would be relatively easy to create the file then go for it. If it requires a large amount of resources, then I’d recommend you hold-back until we clearly see benefits.

    For example, this domain uses the llms.txt file at https://www.longato.ch/llms.txt because it was easy to implement

    • Use robots.txt instead. It’s the only widely respected barrier today
    • Watch your logs. Use tools like Grafana or BigQuery to detect AI crawlers directly
      • Remember, if you use a CDN you’d need to look into the logs within the CDN

    What Might Change Soon for LLMs.txt

    As of now (2025 August) there are no major announcements from LLMs in relation to llms.txt

    ProviderCurrent stance on llms.txtSignal to watch
    OpenAINo support announcedGPTBot documentation updates
    Google / GeminiMonitors but uses Google-ExtendedRevisions to Google-Extended policy
    Microsoft / CopilotSilentBingBlog crawler updates
    MetaNo mentionMeta crawler policy changes
    AnthropicNo mentionClaudeBot UA policy

    Are there any external validation of my findings?

    DateKey developmentWho said / did itTake‑away
    17 Jun 2025“FWIW no AI system currently uses llms.txt.”John Mueller, Google, on BlueskyGoogle confirms zero support and no immediate plans. (Search Engine Roundtable)
    19 Jun 2025ScaleMath publishes an adoption‑tracker deep‑dive.Independent analystsFinds early enthusiasm among dev‑doc sites but no proof of LLM consumption. (ScaleMath)
    02 Jul 2025PPC Land headline – “llms.txt adoption stalls as major AI platforms ignore proposed standard”.Industry pressOpenAI, Google, Anthropic still not honoring the file. (PPC Land)
    22 Jul 2025Mueller advises adding X‑Robots‑Tag: noindex to llms.txt to avoid clutter in Google results.GoogleTactical hygiene tip; doesn’t affect crawling behaviour. (Stan Ventures)
    24 Jul 2025Logs show OpenAI’s crawler fetching llms.txt every ~15 min on some sites. Google’s Gary Illyes repeats “we won’t support it.”Search Engine RoundtableAnecdotal evidence OpenAI is testing discovery, not an official endorsement. (Search Engine Roundtable)
    Late Jul 2025Server‑log studies detect sporadic hits from other AI bots but no sustained utilisation.ArcherEdu analyticsSuggests experiments, not production use. (archeredu.com)

    Where to Go from Here

    • Automate deployment of llms.txt across all properties using your CMS or server configuration.
    • Audit quarterly. LLM behavior evolves fast—track what’s changed.

    Bottom line: llms.txt is a good idea in theory, but today’s bots don’t read it. Until adoption improves, your best defense remains robots.txt and a clear content policy backed by logs.

    FAQ: Understanding llms.txt

    What is llms.txt and who proposed it?

    llms.txt is a proposed text file format that website owners can place at the root of their domain https://example.com/llms.txt. The goal is to help LLMs to improve discovery and indexation.

    Large language models increasingly rely on website information, but face a critical limitation: context windows are too small to handle most websites in their entirety. Converting complex HTML pages with navigation, ads, and JavaScript into LLM-friendly plain text is both difficult and imprecise.
    Source: https://llmstxt.org/

    In addition to this, MD files are used to create raw text versions of pages which allows llm bots to faster crawl and read the content. This is especially important for JS heavy / client side sites.

    Why are they wrong?

    While well-meaning, this recommendation overestimates its real-world effect. As shown in our log analysis, none of the major LLM crawlers (OpenAI’s GPTBot, Anthropic’s ClaudeBot, PerplexityBot, Meta’s crawler, etc.) currently request the llms.txt file. Only traditional SEO crawlers like GoogleBot or BingBot made any contact—and not for training purposes.

    So while it may feel proactive, adding llms.txt today does almost nothing.

    Continue the conversation:

  • HTML Headers for SEO and LLM Optimization: Guide

    What are headers in HTML?

    In the context of a web page, “headers” usually refers to HTML heading elements<h1> through <h6>—that label sections of content.

    TagTypical purposeHow many per page?
    <h1>Page’s main topic (visible headline)One is best practice
    <h2>Major subsectionsAs many as needed
    <h3><h6>Nested sub‑topics inside those sectionsAs depth requires

    Why they matter

    1. Information hierarchy – They create an outline that screen readers, browsers, and LLM pipelines all rely on to understand page structure.
    2. SEO & answer engines – Search crawlers give extra weight to words inside headings when deciding relevance.
    3. User experience – Clear, sequenced headings let human visitors scan and skip to what they need quickly.

    What they are not

    • HTTP headers – Key‑value pairs that travel in a request/response (e.g., Content-Type: text/html).
    • Graphic headers – Visual banners or hero images; those are layout elements, not semantic headings.

    Keep headings sequential (<h2> under <h1>, <h3> under <h2>, etc.) and always follow a heading with real content—text, image (with alt), code block, table, etc.—so both people and machines get immediate context.

    Key take‑aways

    • HTML headings remain a low‑effort, high‑return signal for both traditional search ranking and modern answer‑engine / RAG pipelines.
    • LLMs parse raw HTML more accurately than flattened text; losing <h1><h4> structure hampers retrieval and increases hallucinations.
    • Practical rule‑of‑thumb: one accurate <h1> that states the intent, logical <h2>/<h3> nests, and keep each heading‑bounded section under ~800 tokens for chunk‑level indexing.
    What headings influenceHow Google confirms it
    Relevance scoring & SERP snippetsGoogle’s Search Essentials singles out “the main heading” as a place to surface query words (Google for Developers)
    Result clustering / sitelinksGary Illyes reiterates that heading hierarchy helps Google understand topics and sub‑topics (Search Engine Journal)
    Crawling efficiency & accessibility signals (page experience)W3C tutorial: correct rank order (→…) clarifies sections for assistive tech (W3C)
    UX metrics that feed engagement‑based rankingClear headings cut pogo‑sticking and improve Core Web Vitals “Interaction to Next Paint” via quicker orientation (Mangools recap with John Mueller quote) (mangools)

    General rules I (Flavio) follow when structuring content with headers:

    1. Never jump i.e. from <h2> to <h4>

    Keep levels sequential within a branch (H2 → H3 → H4). When you’re done, step back out in order (H4 → H3 → H2).

    VerdictWhy
    Generally soundWCAG and W3C tutorials warn that “skipping heading ranks can be confusing” for assistive tech, so an <h2> immediately followed by an <h4> is flagged as a potential accessibility issue. W3CPope Tech Blog
    NuanceA new major section may legitimately start with an <h2> after you closed a deeply nested <h4> subsection. That isn’t a “jump”; it’s just unwinding the hierarchy. W3C
    SEO / LLM impactSearch engines and vector‑based retrieval don’t penalise level‑skips directly, but a clean outline improves crawl efficiency, snippet generation and chunk‑level recall. Search Engine JournalSearch Engine Land

    2. Every header must have some content (p, img, table)

    • Never leave an empty <h*> element.
    • Prefer at least a short intro paragraph (or an imbedded component with alt text) before plunging into sub‑headings—this helps both accessibility and embedding quality.
    VerdictWhy
    Correct for accessibilityEmpty heading tags trigger “Empty heading” errors because screen‑reader users hit a heading with no context. Equalize DigitalPope Tech Blog
    But not an absolute HTML requirementA heading may legitimately introduce a cluster of lower‑level headings without its own paragraph. From an information‑architecture view that’s fine, though you still need non‑empty heading text.
    LLM / RAG angleVector chunkers (Azure Document Layout, LangChain, etc.) split on headings and expect some token count below each one; a totally barren section becomes a low‑signal chunk. Including a 1‑‑2‑sentence lead‑in or an illustrative figure keeps chunks meaningful. Microsoft Learn

    Adding Anchor Text Navigation for Deep Linking

    Anchor text navigation helps both users and search engines identify and access specific parts of a long article. By adding unique IDs (id="") to each heading and linking them internally, you create structured, crawlable paths. This enables search engines and LLM crawlers to deep link to meaningful sections, which can surface as “jump to” links or featured snippets in the SERP.

    Why it matters for SEO and LLM visibility

    • Deep linking: Google can display direct links to subsections (e.g., “Jump to What are headers in HTML?”).
    • Improved understanding: LLMs parse structured anchors to map content relationships and context more accurately.
    • User experience: Visitors can quickly navigate long-form content, reducing bounce rates and increasing time on page.

    Best practices

    • Assign unique, descriptive id attributes to each heading (e.g., id="why-headers-matter").
    • Use consistent naming conventions — avoid stop words and spaces in IDs.
    • Ensure internal links reference those IDs with <a href="#why-headers-matter">.
    • Validate links using browser inspector or Lighthouse to confirm all anchors resolve correctly.

    Example implementation

    <h2 id="headers-in-html">What are headers in HTML?</h2>
    <p>Headers provide content hierarchy and context...</p>
    
    <a href="#headers-in-html">Jump to headers section</a>

    Proper anchor-based navigation not only helps Google render sitelinks but also assists LLMs in parsing context at sub-document granularity — a key factor for prompt-driven visibility.

    Example in this article:

    Because of this implementation, search engines will show that specific section

    Practical checklist you can enforce in templates

    Follow those refinements and your existing rules will serve both accessibility guidelines and modern LLM‑powered search without introducing unnecessary rigidity.

    CheckPass criteria
    Sequential levelsRun a crawler test: flag any instance where rank(current) > rank(previous)+1.
    Non‑empty headingsReject builds containing <hN></hN> or headings whose visible text trims to zero characters.
    Token budgetKeep heading‑bounded sections 100‑‑800 tokens; if shorter, merge; if longer, split with an extra sub‑heading.
    Author trainingTeach writers that a heading is a promise: immediately satisfy it with one clear idea (text, figure, code block, etc.).

    Why headings matter for LLM optimisation:

    LLM/RAG concernImpact of keeping HTML headings
    Semantic chunkingAzure Document Intelligence recommends splitting documents on section headers to preserve meaning (Microsoft Learn)
    Information loss in plain‑text pipelinesThe 2025 HtmlRAG study shows headings are among the first things lost when HTML is flattened—hurting answer accuracy (arXiv)
    Vector search recallPinecone (2025) notes that chunk‑level retrieval works when a chunk “makes sense without surrounding text”—heading‑bounded chunks satisfy that rule (pinecone.io)

    keep headings in the source you embed or feed into a vector DB; strip presentation CSS, not the structural tags.

    AreaDoAvoid
    <h1>One per URL; front‑load the primary key‑phrase but keep it human ( ≤ 60 chars).Keyword‑stuffing or duplicating the title tag verbatim.
    HierarchyStep down sequentially (<h2>, <h3>, <h4>). It’s OK to have many <h2> siblings.Jumping levels (e.g., <h2><h4> with no <h3>).
    Chunk length for RAGKeep each heading‑delimited section under ~800 tokens; add a 50‑token overlap if you split programmatically.Fixed‑width splits that ignore headings for semantically rich pages.
    Accessibility boostWrite headings that describe the section (“Installation steps”, not “Section 2”). Helps screen‑reader navigation (W3C)Styling a paragraph to look like a heading without the tag.
    AEM / CMS hygieneEnforce heading components in content fragments; block inline CSS that overrides native <h*> sizing.Allowing authors to choose heading size purely for visual effect.
    MonitoringCrawl your site weekly and flag pages missing <h1> or with > 120 chars in any heading.Relying on manual QA; issues slip in fast at enterprise scale.

    Example of a good structured HTML:

    <!-- Example page: Urban Balcony Gardening – The Complete Guide -->
    
    <h1>Urban Balcony Gardening: The Complete Guide</h1>
    <p>From São Paulo to Singapore, city dwellers are converting concrete perches into productive micro‑gardens. This guide distills first‑hand lessons, data‑backed tips, and a pinch of experimentation so you can harvest greens above the street noise.</p>
    
    <h2>1. Why Grow on a Balcony?</h2>
    <p>Rent increases may be out of your control, but fresh basil isn’t. Balcony gardens supply hyper‑local produce, buffer street dust, and create a calming ritual that offsets screen time.</p>
    
      <h3>1.1 Key Benefits</h3>
      <p>Edible foliage reduces food miles to zero, and the act of tending plants lowers cortisol levels according to a 2024 University of Helsinki study. Plus, a lush rail‑box can nudge real‑estate photos up a price tier.</p>
    
      <h3>1.2 Limitations to Plan For</h3>
      <p>Most balconies offer erratic wind tunnels and less than seven hours of sunlight. Weight restrictions typically hover around 350 kg / m²—double‑check your building code before adding ceramic planters.</p>
    
    <h2>2. Planning Your Space</h2>
    <p>A well‑planned 3 m² balcony can outperform a careless 10 m² terrace. Sketch zones for seating, vertical trellises, and a hidden compost bucket before buying a single seed packet.</p>
    
      <h3>2.1 Measuring Light and Wind</h3>
      <p>Track shadow shifts every two hours for a full day. A phone gimbal and time‑lapse mode capture the data without babysitting the camera.</p>
    
        <h4>2.1.1 Using Sun‑Tracking Apps</h4>
        <p>Apps like SunCalc overlay azimuth angles on live AR. Export the CSV to spot high‑radiation pockets where peppers will actually blush red.</p>
    
      <h3>2.2 Choosing Safe Containers</h3>
      <p>Look for food‑grade HDPE symbols and avoid recycled PVC if you intend to grow root vegetables.</p>
    
        <h4>2.2.1 Pot Materials Compared</h4>
        <p>Fabric grow‑bags breathe well but dry quickly; glazed terracotta retains moisture yet can crack in freeze‑thaw cycles. Stainless steel buckets are indestructible but flash‑cook roots in midsummer unless shaded.</p>
    
    <h2>3. Picking the Right Plants</h2>
    <p>Ignore glossy seed‑catalog photos. Instead, shortlist cultivars tested in balcony labs—compact habit, shallow roots, and shade tolerance matter more than botanical exotica.</p>
    
      <h3>3.1 Herbs That Thrive in Pots</h3>
      <p>‘Genovese Compact’ basil bulks up without bolting, while Vietnamese coriander laughs at humid nights that fell supermarket cilantro.</p>
    
      <h3>3.2 Compact Vegetables</h3>
      <p>Try ‘Patio Star’ zucchini: each plant tops out at 45 cm yet still kicks out fist‑sized fruit. Harvest every three days or it will outgrow its welcome.</p>
    
      <h3>3.3 Ornamental Flowers</h3>
      <p>Calendula doubles as pollinator magnet and salve ingredient. Deadhead spent blooms to keep colour going until first frost.</p>
    
    <h2>4. Care and Maintenance Schedule</h2>
    <p>Consistency beats intensity. Ten minutes daily with a moisture meter saves you an hour of emergency triage later.</p>
    
      <h3>4.1 Watering by Season</h3>
      <p>Morning soak in July, dusk sprinkle in January. Bottom‑watering trays cut losses during weekend trips.</p>
    
      <h3>4.2 Fertilising Without Overdoing It</h3>
      <p>Alternate kelp emulsion and worm‑casting tea on a 14‑day cadence. Skip feedings the week before harvesting leafy greens to avoid nitrate spikes.</p>
    
      <h3>4.3 Pest Detection Checklist</h3>
      <p>Inspect undersides of leaves on Wednesdays and Saturdays, brush off aphids with a soft paintbrush, and quarantine any plant that hosts more than ten scale insects.</p>
    
    <h2>5. Troubleshooting Common Issues</h2>
    <p>Even veteran growers lose a seedling now and then. Instead of panic‑pruning, match symptoms to likely causes and adjust a single variable at a time.</p>
    
      <h3>5.1 Yellowing Leaves</h3>
      <p>Chlorosis often signals iron lockout in alkaline potting mixes—flush with rainwater before reaching for supplements.</p>
    
      <h3>5.2 Slow Growth</h3>
      <p>If day‑night temperature swings exceed 12 °C, metabolic slowdown is inevitable. A clear polycarbonate panel acts as a windbreak and thermal buffer.</p>
    
      <h3>5.3 Balcony Weight Limits</h3>
      <p>Swap soil for coco‑coir and perlite blend to halve payload. Group heavy containers near support columns rather than cantilevered corners.</p>
    
    <h2>6. Final Thoughts and Next Steps</h2>
    <p>Document every win and flop in a shared spreadsheet; patterns emerge after two seasons. When neighbours peek over the railing, hand them a homegrown cherry tomato—community is the best fertiliser of all.</p>
    
  • GEO vs AEO vs LLM Optimization: What’s the Difference?

    What does this GEO / AEO and LLM Optimization all mean?

    The optimization (improvement) of your content (be it website, video, images, document etc..) so that it has higher visibility on large-language models (LLMs) / GPT

    GEO acronyms

    acronymsMeaning
    GEOGenerative Engine Optimization
    AEOAnswer Engine Optimisation
    LLMLarge Language Models
    This table explains each acronym used

    In essence, similar to SEO (Search Engine Optimization). You want LLMs / GPTs to be able to:

    1. Discover your content
    2. Crawl and be able to extract information from this
    3. Add this information to their knowledge graph
    4. Answer questions in reference to this content

    Common questions about Generative Engine Optimization (GEO):

    Why don’t we just call it SEO?

    Most people assume that “search” is in relation to a specific search engine such as Google, Bing, Duck Duck Go and so on. A GPT doesn’t “search” for spesific keywords / queries within an index of information. What it does is try to understand the intent of the prompt and deliver knowledge based on this.

    One think you REALLY need to understand about modern LLMs:

    Modern GPT LLMs often break prompts into search queries and run searches to find the most up to date information. They often only look on the 1st SERP (Search Engine Result Page) and then refresh the content they have. In essence, if the topic has some form of freshness i.e. younger than the last update of the model (GPT 4.0 is over 1 year old) then they will perform a search. Essentially, if you are not ranking on the first page of a search engine you will not be in the results.

    Does this mean SEO is not dead?

    100%, without search all models would be stuck to old and often out-data data. This is the only way the models currently can still give relevant information.

    Does optimizing for SEO improve LLM visibility?

    Most of the cases, yes!