UPDATES

Your AI "Share of Voice" is Meaningless

Your AI "Share of Voice" is Meaningless

Keller Maloney

Unusual - Founder

Apr 20, 2026

If 100 people asked the same question to a search engine, they would all get the same results. If 100 people answered the same question, you get maybe five or ten different answers (the whole premise of Family Feud). If 100 AI models answered the same question, you get 100 different answers, each personalized to whomever is asking.

That's the physics underneath every AI conversation, and it breaks the measurement framework the industry has been building on top of it.

AI "Share of voice" is meant to describe how often a brand appears in AI answers. It is computed by taking a hand-picked set of prompts, like "What's the best CRM?", running those prompts through various AI systems, and then counting how often the brand is mentioned in the AI's response.

Share of voice has become a core metric for AI brand tracking. Some brands are launching PR campaigns, commissioning content, and rebuilding their websites to move their share of voice upward.

Unfortunately for these brands, share of voice is a false idol. It doesn't measure anything in the real world. This means that optimizing for it will not increase their market share or pipeline. Here's why.

What's wrong with share of voice?

The prompt set is constructed by hand

Share of voice is a fraction: the number of times AI mentions your brand divided by the number of total prompts you or a vendor picked. The denominator determines your share of voice, so that list of prompts becomes the ground truth for every decision the metric informs.

No one can credibly claim those prompts represent what your buyers are actually asking. Some vendors claim privileged access to real prompt data harvested from real users. That data is hopelessly biased, fragmented, and often illegally obtained.

Real buyers don't use the wording you'd pick. Your prompt set is a guess about their behavior, and your share of voice is your performance on that guess.

Prompt tracking is fragile

I ran an experiment to test how stable share of voice is. I ran 100 prompts about CRMs through AI models, like "what's the best CRM?" I recorded the responses and each brand's share of voice. Then I swapped exactly one word in each prompt with a synonym (e.g. swapped "best" CRM to "top" CRM). This swap did not change the meaning of the prompt, so I did not expect the responses to change much.

The results were wildly unstable: a brand's share of voice moved by as much as 17% from the original prompts to the synonym-swapped prompts, 33% of the vendors in a typical answer changed identity between versions, and only 16 of 100 prompt pairs produced identical vendor sets.

Profound, one of the loudest proponents of prompt-tracking and share of voice, accidentally published research showing that 40-60% of the domains cited in AI responses are completely different one month later, even for identical prompts. They reframed this as "volatility" that brands need to manage. It's actually evidence that the system of measurement they're using is flawed.

The number of potential prompts is too big

The deeper problem is that the space of prompts your buyers could actually ask is combinatorially huge, and it's much bigger than a search-era intuition suggests. The average Google search is only 3-4 words long. The average ChatGPT prompt is typically 20 words or more, and often much longer than that.

That difference matters enormously when you start multiplying out possibilities. A few words of flexibility in a search query opens up maybe thousands of plausible phrasings. Twenty-plus words of flexibility in a prompt opens up a number of possibilities greater than the grains of sand on Earth. Push it a little further and you're past the number of atoms in the observable universe.

And that's just the first prompt. Every meaningful AI conversation is a sequence of prompts, each one branching off the last and shaped by context the model already has about the user. The real space of plausible buyer interactions is larger than any comparison that lands intuitively.

Sampling 100 points from that space and calling it "share of voice" is closer to scooping 100 grains of sand from a beach and claiming you measured the beach.

It doesn't account for how people actually use AI

Even granting a perfect prompt set, share of voice measures the output of a single query. Nobody uses AI that way. Real buyers have long, multi-turn conversations that average eight or more messages, where they introduce constraints, push back on recommendations, add context the model didn't have a minute ago, and narrow their question as they go.

The model's recommendation at turn one is almost never its recommendation at turn eight. By the time a buyer is close to making a decision, the conversation has been refined, qualified, and re-framed several times. Share of voice attempts to capture the first turn. The decision happens later, in a conversation the metric was never designed to see.

It doesn't account for personalization

Within any single turn, there is no single "answer" to measure a share of. The same prompt from 100 different people yields 100 different responses, each shaped by context the model has about the person asking: prior conversations, role, company, preferences, constraints they mentioned last week. Two VPs of sales asking the identical question get materially different recommendations, because the model is drawing on a personalized picture of each of them.

Share of voice averages these responses into a single number. The assumption behind that average is that the distribution is uniform, when in fact it's shaped entirely by user context the tracking tool cannot see.

This is the Family Feud observation made concrete. When a system has no single answer to give, a metric that reports your share of "the answer" is measuring a fiction.

The takeaway

It's alarming to watch brands make real strategy decisions off this constructed metric. They're not measuring what they think they're measuring, and they're not measuring what actually matters.

This problem isn't unique to share of voice. Every rate-based metric in this category (citation rate, mention rate, visibility score) is a fraction calculated against a hand-picked denominator, which means it inherits every problem described above. Any "rate" you can put on a dashboard is only as honest as the prompts behind it, and in this domain those prompts are always constructed.

The metric moves. The pipeline doesn't always follow. That gap is where most AI brand strategy is currently living, and it's a strange place to build a budget.