---
title: "What Is GEO Measurement and Why Is It So Hard?"
date: "2026-01-23"
author: "Flavio Longato"
categories: ["Generative Engine Optimization Course", "GEO"]
url: "https://www.longato.ch/what-is-geo-measurement-why-is-it-hard/"
---

&lt;figure class=&quot;wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio&quot;&gt; &lt;/figure&gt;Most marketers talk about Generative Engine Optimisation as though it were just SEO with a new hat. But when you try to measure GEO performance, you quickly discover a problem: the numbers shift every time you look at them. AI responses are non-deterministic, region-dependent and shaped by personalisation. That makes reliable measurement genuinely difficult.

I have spent the past year tracking how brands appear inside AI-generated answers. In that time I have watched the same prompt return different brand mentions on Monday and Tuesday, from the same model, in the same location. If you are investing in GEO, you need to understand exactly where measurement stands today and where the gaps still sit.

Why AI Responses Keep Changing
------------------------------

Large language models are non-deterministic by design. Run the same prompt twice and you can get different wording, different sources cited and different brands mentioned. This is not a bug. It is how probabilistic text generation works. Temperature settings, model updates and cached context all influence output.

For traditional search, you could pull rank-tracking data and trust it for a week. With GEO, a single data point tells you very little. You need repeated samples across time, regions and user states to build a picture you can act on. Google’s own documentation on [AI principles](https://ai.google/responsibility/principles/) acknowledges that model behaviour varies with context. That variation flows straight into your measurement challenge.

Region and Personalisation Add Noise
------------------------------------

Location matters more than most people realise. A prompt about “best project management tools” will surface different brands in Spain than in the United States. The model draws on regional training data, local popularity signals and language cues. If your measurement tool does not simulate queries from the correct region, your data is misleading from the start.

Personalisation adds another layer. When a user is logged into ChatGPT or Gemini, their conversation history and preferences shape the response. That means two users asking the same question can see completely different brand recommendations. Measuring “your” visibility in AI answers is therefore always an approximation. The best you can do is control for region, strip out personalisation where possible and sample frequently.

Here is my contrarian take: most GEO tools overstate their accuracy because they run a handful of prompts from a single location and call it visibility data. That is not measurement. That is a screenshot. Real measurement requires hundreds of prompt variations, multiple regions and longitudinal tracking. If a vendor cannot explain their sampling methodology, treat their numbers with scepticism.

The Brand Detection Problem
---------------------------

One issue that caught me off guard early on was brand detection. It sounds simple: scan the AI response for your brand name and count mentions. But many brands share names with common words. Think “Apple” the fruit versus Apple the company, or “Teams” the Microsoft product versus teams of people.

When I first tested detection scripts against real AI outputs, false positives were everywhere. A response about workplace collaboration would mention “teams” five times without once referring to Microsoft Teams. You need entity disambiguation, not string matching. Tools like [those reviewed by Search Engine Journal](https://www.searchenginejournal.com/generative-engine-optimization-geo/524906/) are starting to address this, but it remains a weak spot across the industry.

The solution involves building custom detection layers that understand context. You look at surrounding words, the prompt category and the typical entities that appear together. It is slow, manual work. But without it, your visibility score is fiction.

What You Can Measure Today
--------------------------

Despite these challenges, useful measurement is possible. Here is what works reasonably well right now:

- Brand mention frequency across a large sample of prompts related to your category.
- Sentiment analysis of how AI models describe your brand when they do mention it.
- Share of voice compared to competitors within specific prompt clusters.
- Regional differences in brand visibility across key markets.

These metrics give you directional insight. They tell you whether your GEO efforts are moving the needle. They do not yet give you the precision of a Google Search Console click report. That gap is real, and pretending otherwise helps nobody.

Tools built specifically for GEO tracking, such as those explored in [Moz’s research on AI search](https://moz.com/blog), are improving fast. Sampling methods are getting smarter. Regional simulation is more accurate. Brand detection is catching up. But we are still in the early innings.

Where GEO Measurement Goes From Here
------------------------------------

The trajectory is clear. As more organisations invest in GEO, the demand for reliable measurement will push tooling forward. I expect three shifts over the next twelve months. First, API-level access to AI platforms will allow real-time sampling at scale. Second, standardised metrics will emerge so that brands can benchmark across tools. Third, personalisation modelling will let you estimate visibility for different audience segments, not just a generic “average user.”

My experience working with early GEO data for clients across multiple sectors has taught me one thing above all: patience matters more than precision right now. Track trends, not snapshots. Compare quarters, not days. Build your measurement practice today so that when the tools catch up, you already have baseline data to measure against.

GEO measurement is messy, imperfect and improving fast. The brands that accept that reality and invest anyway will be the ones with a head start when the data finally sharpens.