"Prompt Volume" is Bad Data

Marketing teams should avoid solutions built on prompt volume data. It's attractive at first glance, but unethical, misleading, and useless.

Keller Maloney

Unusual - Founder

Apr 18, 2026

If you're a marketer trying to understand how buyers use AI, you want real prompts. You want to know what your buyers actually type, not what a vendor guessed they might. Trying to guess what users are asking AI is the weakest parts of AI brand tracking, and wanting to replace them with the real thing is a reasonable response to that weakness.

A handful of AEO and GEO vendors sell what they describe as real prompt data: millions of prompts harvested from real users interacting with real AI models. On paper, this is the exact dataset marketers wish they had. In practice, it's not what it claims to be. Security research published this month pins down where the data actually comes from. Prompt volume data is secretly harvested by Chrome VPN extensions without their users' consent.

The kicker is that even if the data were legitimate, it would be useless for strategy. I'll walk through both the data harvesting and the problem with the dataset itself below.

Where the data comes from

Koi Security found that Urban VPN, a Chrome extension with over 6 million users and a "Featured" badge in both the Chrome and Edge stores, has been silently recording its users' AI conversations since a July 2025 auto-update. Across Urban VPN and its sister extensions from the same publisher, over 8 million users are affected.

The captured data includes prompts, model responses, and timestamps across major AI providers. The extension intercepts that traffic and ships it to a data broker named BiScience, which repackages and resells it as "marketing analytics." AI search analytics vendors buy the data from BiScience and sell it on to their customers.

OpenAI does not release prompt data. Neither does Anthropic, Google, or any other major AI provider. There is no legitimate, at-scale source of real user prompts for purchase.

Why the sample doesn't describe your buyers

Even getting full access to every prompt in the dataset, four structural problems should give a marketer pause.

The sample is biased. The dataset isn't a cross-section of AI users. It's a cross-section of people who installed a shady VPN extension. That population skews heavily toward users trying to evade paywalls, scrapers, and the consumer fringe of the internet. An enterprise buyer evaluating a six-figure SaaS contract is unlikely to be in that population.

Prompt tracking is fragile. I ran an experiment that swapped a single meaning-preserving synonym in each of 100 CRM prompts, "best" for "top," "scalable" for "high-volume," and reran them. A single brand's share of voice moved by as much as 17% between the two versions. 33% of the vendors in a typical answer changed identity. Only 16 of 100 prompt pairs produced identical vendor sets. Harvested prompts reflect the exact phrasings one specific group of users happened to type, and your buyers phrase things differently.

The data is too narrow. These datasets count individual prompts, not full conversations. The average ChatGPT conversation is eight messages long. Buyers start broad, layer in constraints like budget, integrations, and team size, and only accept a recommendation once the model has absorbed their full context. A dataset of one-off prompts cannot reconstruct the conversation around them.

You can't see the user's context. AI models personalize their responses to the user asking. A thousand people asking "what car should I buy?" receive a thousand different answers (an SUV for a family, a sports car for a single man, an EV for a San Franciscan ;) ). Even with a perfect list of buyer prompts, you cannot see what the model is telling those buyers about you, because the answer changes with every asker.

What's actually worth measuring

Prompt volume data is an attempt to answer a real question with a broken method. The question, what is AI telling my buyers?, is worth answering. The method fails because the data is drawn from a population that doesn't look like your buyers, and because the shape of the data, which is isolated and missing context, cannot reproduce the way AI is actually used.

A better approach starts from the other end. Instead of trying to understand buyer prompts, try to understand patterns in the model's behavior across a deliberate spread of plausible scenarios: how it thinks, what sources it pulls from, and what it thinks matter to your buyers.

The dataset you want isn't a pile of prompts. It's a map of what beliefs a model has about your solution.

The Unusual Feed

Content for AI

RESEARCH

What AI Believes About Your Brand

May 26, 2026

A synthesis of four research papers from the Unusual.ai research series—37,000+ survey questions against ChatGPT and Claude—and what they imply for established brands marketing into AI commerce.

TUTORIALS

How to Tell if AI is an Advocate for Your Brand

May 22, 2026

This guide is the deep dive on how to measure whether AI is making your prospects more likely to convert (advocate) or less likely to convert (detractor).

INSIGHTS

How Google Search Agents Changes Marketing

May 22, 2026

This week, Google released personal shopping assistants designed to help shoppers "find exactly what they need at exactly the right moment." Here's how that changes marketing.

RESEARCH

What AI Believes About Your Brand

May 26, 2026

A synthesis of four research papers from the Unusual.ai research series—37,000+ survey questions against ChatGPT and Claude—and what they imply for established brands marketing into AI commerce.

TUTORIALS

How to Tell if AI is an Advocate for Your Brand

May 22, 2026

This guide is the deep dive on how to measure whether AI is making your prospects more likely to convert (advocate) or less likely to convert (detractor).