RESEARCH
A synthesis of four research papers from the Unusual.ai research series—37,000+ survey questions against ChatGPT and Claude—and what they imply for established brands marketing into AI commerce.

Will Jack
Unusual - Founder
We ran a large scale survey (37,000+ questions) across four research papers in our research series, against ChatGPT and Claude, on commercially-framed buyer queries spanning 19 sectors. We found two clear takeaways:
First, for mid-market and established brands, AI commerce is a belief problem rather than a discoverability problem.
Second, the standard AEO/GEO playbook — track mention rates, optimize for share of voice, publish more comparison pages — may work well for small brands, but it is insufficient for established brands.
The research also tells us what the right playbook for established brands looks like. That playbook applies AI interpretability techniques to AI commerce — tools that give us a lens into how models think about brands and the markets they play in, and inform our clients how to position their products and marketing to speak to the AI audience.
If you lead marketing at a category leader, an established challenger, or a mid-market player with meaningful share, AI already knows you exist; your years of hard SEO work have solved that for you. When AI doesn't choose you, it's doing so because of its beliefs about you, its intrinsic preferences, or its pre-existing biases.
The question, therefore, becomes what does AI believe about you, and what does it value when making a recommendation or purchasing decision?
What sounds like a soft branding question is, in fact, the hard one. Visibility was the easy half of the brand problem in AI. For established brands, it's largely behind us. Belief and values are the operative half — and the standard AEO and GEO tools your team has been pitched on don't actually measure them.
What the research found
1. The bottleneck is different at every brand tier
When AI recommends brands in a category, it runs a short funnel — retrieve a set of candidates, compare them on the dimensions that matter for the buyer, name the winners — and brands at different prominence tiers drop out of that funnel at different points. L1 category leaders are losing the comparison, not the visibility. L2 challengers lose to persona-mediated substitution. L3 mid-market brands face both retrievability and alignment work at once. The right marketing investment depends on which tier you sit at — there is no uniform AEO recipe that wins across the ladder.
We labeled 533 brands across 19 commercial sectors by prominence — L1 category leaders (Salesforce, HubSpot, Datadog), L2 established challengers (Pipedrive, Gusto, Trello), L3 mid-market (Copper, Sentry, Deel), and two long-tail tiers below. For each brand we measured where in the AI's funnel it dropped out: was it retrieved at all? Did it appear in the answer? Was it actually recommended?
The bottleneck splits cleanly along the ladder. L1 brands appear in nearly every relevant retrieval — they're not failing to be found — but win only 25–41% of the slots they reach. L2 brands carry the highest conversion rates of any tier (37–52%) but lose to substitution when the buyer is described differently. L3 mid-market sits at the inflection level: both retrievability and alignment are active concerns, conversion drops to 34–40%, and persona effects are at their strongest.
For an established brand, the work isn't "show up in more searches" — that's mostly behind you. The work is what AI believes about you when it does encounter your category: which dimensions of comparison it weighs, what evidence is shaping its view, which buyer segments it maps you to, against which competitors.
2. Tracking AI mentions isn't tied to revenue
We don't think mention rate, citation rate, or share-of-voice are useful headline metrics for AI commerce. They're disconnected from the only metric that matters for established brands: revenue downstream of an AI surface. A brand that appears more often in a tracker's dashboard isn't necessarily a brand that wins more deals — and a brand whose tracker number falls 8% in a week isn't necessarily a brand that's actually losing share.
There are two reasons mention-rate trackers are weak signal. The first is sensitivity to how the buyer phrases the question. We measured what happens when the same buyer intent is expressed two different ways — "best CRM" versus "best CRM for SaaS startups," for instance. The recommendation sets overlap on as little as 14% of brands. Compare that to running the same prompt twice on the same model on the same day, where overlap sits at 50–61%. The variation between two natural rewordings of the same question is an order of magnitude larger than the noise floor — and your real buyers don't ask their questions the same way as each other, much less as a tracker's fixed prompt corpus.
The second, deeper reason: even when the tracker reports a stable number, the number conflates three different upstream causes. A drop in mention rate could mean (a) AI is no longer retrieving you, (b) AI is retrieving you but no longer picking you, or (c) random paraphrase or persona noise. The interventions that fix each of these are completely different. A single-number dashboard cannot tell you which lever to pull.
Our metric of choice is ROI-tied: actual buyer behavior — click-through, signup, qualified-pipeline contribution, revenue — downstream of the AI surface. Real buyers issue their own paraphrases of every intent; they convert or they don't; the buyer-behavior signal aggregates over the whole paraphrase distribution that actually exists in the wild. ROI integrates over the noise that mention rates magnify, and ties directly to the question that matters: is AI driving customers to your business?
3. AI segments your buyers by who they say they are
When the same prompt is asked by buyers who describe themselves differently — "I'm a solo founder bootstrapping" vs "I'm a VP at an enterprise" vs "I'm a UK SMB owner" — AI gives them materially different recommendation sets. Based on our 2,000 trials, we found the effect concentrates at mid-market: L3 brands swap up to 75% of recommended brands as the persona changes, while L1 category leaders are persona-resistant (~80% same-brand consistency).
The implication for marketers: AI is already running a segmentation on your buyer base that you didn't design and your tracker can't see. A buyer who frames themselves as a founder gets one set of recommendations; the same buyer framed as enterprise gets a different one. Moreover, the models follow different reasoning for different persona recommendations — any alignment work that isn't segment-aware will mismatch large portions of the buyer population.
4. The diagnosis transfers — across providers, and broadly across brand tiers
When AI doesn't recommend you, the underlying reason is largely the same across providers and across brand tiers. The implication: the work to influence what AI believes about your brand is largely one playbook, not five. You diagnose once, and the prescription of what work to do transfers.
When neither ChatGPT nor Claude recommends a brand, both providers diagnose the underlying reason the same way 95.1% of the time overall, and 99%+ for less-prominent brands. The recommended brands differ — about a third of brands recommended by either provider appear on both — but the underlying belief failure that produced the omission is shared. Diagnostic agreement holds across every tier of the L1–L5 ladder, with the strongest convergence at lower tiers and modest divergence on the most prominent brands.
The new playbook for established brands: speak to what AI values, measure direct impact
For brands at the top of the market, AI already knows you exist. The standard tracking tools can't tell you what to actually do about your AI strategy. What changes is what AI specifically believes about your brand on the dimensions it's using to make recommendations in your category.
For an established brand, the metric that matters isn't how often AI mentions you. It's what AI believes about you, and how those beliefs tie into what it values when it makes a buying decision. The traditional AEO/GEO playbook has no way to measure these beliefs, and they don't tie to revenue.
The prescription has two parts. The first is AI interpretability: figure out which dimensions AI is actually using to differentiate brands in your category, find out where your brand sits on each, and target content and positioning interventions at the axes that need to move.
The second is moving your success metric away from mention rate, citation rate, and share-of-voice — and toward real-world ROI and attribution. The question that matters isn't how often AI says your name. It's which AI surfaces drove which conversions, which content placements actually showed up in real buyer journeys, and which interventions moved revenue. Belief-level diagnostics tell you what to do; ROI-tied attribution tells you whether what you did worked.
Talk to us
If you lead an established brand and you want to move past Share-of-Voice — into belief diagnostics and ROI-tied attribution — that's the work we do. We show you what AI currently believes about your brand on the dimensions that matter in your category, against which competitors, for which buyer segments. We then tie that diagnostic to real-world buyer behavior — which AI surfaces are driving conversions, which interventions are moving revenue — so the right next moves become obvious and you can measure whether they actually worked.
The four underlying papers — on prominence-stratified failure modes, cross-provider convergence, paraphrase brittleness, and persona conditioning — are available [here].