INSIGHTS

"Prompt Volume" is Bad Data

"Prompt Volume" is Bad Data

Marketing teams should avoid solutions built on prompt volume data. It's attractive at first glance, but unethical, misleading, and useless.

Marketing teams should avoid solutions built on prompt volume data. It's attractive at first glance, but unethical, misleading, and useless.

Keller Maloney

Unusual - Founder

Apr 19, 2026

If you're a marketer trying to understand how buyers use AI, you want real prompts. You want to know what your buyers actually type, not what a vendor guessed they might. Trying to guess what users are asking AI is the weakest parts of AI brand tracking, and wanting to replace them with the real thing is a reasonable response to that weakness.

A handful of AEO and GEO vendors sell what they describe as real prompt data: millions of prompts harvested from real users interacting with real AI models. On paper, this is the exact dataset marketers wish they had. In practice, it's not what it claims to be. Security research published this month pins down where the data actually comes from. Prompt volume data is secretly harvested by Chrome VPN extensions without their users' consent.

The kicker is that even if the data were legitimate, it would be useless for strategy. I'll walk through both the data harvesting and the problem with the dataset itself below.

Where the data comes from

Koi Security found that Urban VPN, a Chrome extension with over 6 million users and a "Featured" badge in both the Chrome and Edge stores, has been silently recording its users' AI conversations since a July 2025 auto-update. Across Urban VPN and its sister extensions from the same publisher, over 8 million users are affected.

The captured data includes prompts, model responses, and timestamps across major AI providers. The extension intercepts that traffic and ships it to a data broker named BiScience, which repackages and resells it as "marketing analytics." AI search analytics vendors buy the data from BiScience and sell it on to their customers.

OpenAI does not release prompt data. Neither does Anthropic, Google, or any other major AI provider. There is no legitimate, at-scale source of real user prompts for purchase.

Why the sample doesn't describe your buyers

Even getting full access to every prompt in the dataset, four structural problems should give a marketer pause.

The sample is biased. The dataset isn't a cross-section of AI users. It's a cross-section of people who installed a shady VPN extension. That population skews heavily toward users trying to evade paywalls, scrapers, and the consumer fringe of the internet. An enterprise buyer evaluating a six-figure SaaS contract is unlikely to be in that population.

Prompt tracking is fragile. I ran an experiment that swapped a single meaning-preserving synonym in each of 100 CRM prompts, "best" for "top," "scalable" for "high-volume," and reran them. A single brand's share of voice moved by as much as 17% between the two versions. 33% of the vendors in a typical answer changed identity. Only 16 of 100 prompt pairs produced identical vendor sets. Harvested prompts reflect the exact phrasings one specific group of users happened to type, and your buyers phrase things differently.

The data is too narrow. These datasets count individual prompts, not full conversations. The average ChatGPT conversation is eight messages long. Buyers start broad, layer in constraints like budget, integrations, and team size, and only accept a recommendation once the model has absorbed their full context. A dataset of one-off prompts cannot reconstruct the conversation around them.

You can't see the user's context. AI models personalize their responses to the user asking. A thousand people asking "what car should I buy?" receive a thousand different answers (an SUV for a family, a sports car for a single man, an EV for a San Franciscan ;) ). Even with a perfect list of buyer prompts, you cannot see what the model is telling those buyers about you, because the answer changes with every asker.

What's actually worth measuring

Prompt volume data is an attempt to answer a real question with a broken method. The question, what is AI telling my buyers?, is worth answering. The method fails because the data is drawn from a population that doesn't look like your buyers, and because the shape of the data, which is isolated and missing context, cannot reproduce the way AI is actually used.

A better approach starts from the other end. Instead of trying to understand buyer prompts, try to understand patterns in the model's behavior across a deliberate spread of plausible scenarios: how it thinks, what sources it pulls from, and what it thinks matter to your buyers.

The dataset you want isn't a pile of prompts. It's a map of what beliefs a model has about your solution.