INSIGHTS

Keller Maloney
Unusual - Founder
Dec 23, 2025
A compelling pitch has emerged in the AI optimization space: what if you knew exactly what people were asking ChatGPT? If you had access to real user prompts—the actual queries your customers type into AI assistants—you could optimize for those queries the way you once optimized for Google searches.
This is the core value proposition of "prompt volume" data, offered by a growing number of AEO and GEO tools. The problem isn't that it sounds useful. It's that the data is broken in ways that make it nearly useless for actual strategy.
How prompt datasets are collected
OpenAI doesn't release prompt data. Neither does Anthropic, Google, or any other major AI provider. This is a good thing—imagine how much sensitive information lives in your ChatGPT history. Medical questions, financial anxieties, drafts of difficult conversations, relationship problems. The researcher who broke the story we're about to discuss put it well: he'd developed a level of candor with his AI assistant that he doesn't have with most people in his life.
So where do prompt datasets come from?
Chrome extensions. Koi Security published research this month showing that Urban VPN—a "Featured" extension with over 6 million users—has been secretly harvesting every AI conversation its users have since July 2025. ChatGPT, Claude, Gemini, Copilot, Perplexity—all of it intercepted, compressed, and sold to data brokers for "marketing analytics purposes." Across Urban VPN and its sister extensions, over 8 million users are affected.
The data flows to BiScience, a broker that packages it into products for advertisers and, notably, into the prompt datasets that power AI search analytics tools. Users who installed a VPN extension for privacy woke up one day—after a silent auto-update—with new code harvesting their most intimate conversations.
The sampling problem
Even setting aside the ethics, this collection method creates a sampling problem that should concern anyone trying to draw strategic conclusions from prompt data.
You're not seeing "what people ask ChatGPT." You're seeing what people ask ChatGPT who also happen to have installed a shady VPN extension.
This is a textbook sampling fallacy. The demographics, use cases, and sophistication of this group almost certainly differ from actual buyers researching enterprise software or comparing SaaS tools. It's like surveying only people who answer calls from unknown numbers and extrapolating to the general population.
The combinatorial explosion
Even if the data were ethically collected and properly sampled, there's a deeper problem: almost every ChatGPT prompt is asked exactly once. There is no "head" of common queries to optimize for. The concept of "ranking for a query" stops making sense when no query repeats.
This follows from basic math. The average Google search is 3-4 words. The average ChatGPT prompt is around 23 words. This isn't 6x more complexity—it's exponentially more.
English has somewhere between 20,000 and 50,000 commonly used words. If we're conservative and say people draw from a working vocabulary of about 10,000 words when typing queries, then the number of possible 4-word combinations is roughly 10,000^4, or about 10^16. That's a lot, but search engines handle it because query patterns cluster. Millions of people search "best project management software."
A 23-word prompt, using the same vocabulary, has roughly 10,000^23 possible combinations—a number so large it's effectively infinite. The search space isn't 6x bigger; it's bigger by a factor of 10^19. In SEO, we talked about long-tail keywords as a strategy. In AI prompts, it's all long-tail.
The multi-turn problem
Prompt datasets capture individual messages, not conversations. But the moment of recommendation—the turn where the model actually picks a winner—is invisible in prompt-level data. You're seeing fragments of conversations with no way to reconstruct what led to the decision.
The average ChatGPT conversation is eight messages long. The recommendation rarely happens on the first turn. Consider how a typical buying conversation unfolds. A user starts broad: "What are the best project management tools?" The model sketches the landscape. Then the user adds a constraint: "Which ones integrate with Slack?" The model narrows. Another constraint: "Which is better for non-technical teams?" By the time a recommendation emerges, the model has accumulated context about the user's stack, team size, and preferences.
A prompt like "what about the pricing?" is meaningless without knowing what product the user was asking about. A prompt like "which one would you recommend?" tells you nothing if you don't know the constraints that preceded it.
You can't see the answers
This is the biggest problem. Even if you knew every prompt your customers were asking—perfectly sampled, complete conversation context included—you still wouldn't know what ChatGPT said in response. You're trying to optimize against a target you literally cannot see.
This is what makes prompt volume data fundamentally different from search data. Keyword research worked for Google because you could close the feedback loop. If you searched "best CRM for startups," you could see exactly what Google returned. You could see where you ranked, who ranked above you, and what content was winning. You could reverse-engineer what Google wanted and adjust your strategy accordingly.
You cannot do this with ChatGPT. And unlike Google, the model doesn't give the same answer twice. Responses are personalized to the user, shaped by conversation history, and influenced by the model's own variability. There is no consistent "ranking" to observe. There is no SERP to screenshot.
What to do instead
The alternative is to zoom out.
Instead of chasing specific prompts that will never repeat, work at the level of topics and opinions. Google Search Console still tells you what themes your customers care about—are they asking about pricing, competitors, integrations, implementation timelines? That level of abstraction is actually useful.
Instead of trying to "rank" on queries, focus on shaping the model's opinion. AI models form latent views about brands: who you're for, what you're good at, how you compare to alternatives. These opinions emerge from aggregated content across the web—your documentation, case studies, third-party reviews, comparison pages. The goal isn't to win a keyword; it's to earn the recommendation when a buyer's constraints match your strengths.
And instead of relying on prompt datasets, measure what matters directly. Fire targeted prompts at models to understand how they currently perceive your brand. Track recommendation share in realistic scenarios. Watch whether your opinion gaps are closing over time. You can't see what your customers asked, but you can see what the model believes—and that's what determines whether you win.
The mirage
Prompt volume data promises insight into the black box of AI conversations. But the data is hopelessly biased, the search space is too vast for patterns to emerge, the conversational context is missing, and you can't see the responses anyway.
The instinct behind prompt volume data is understandable: marketers want to know what their customers are asking so they can show up with the right answer. But the execution is flawed because it tries to apply a search-engine framework to a system that doesn't work like a search engine.
It's not that the people building these tools are acting in bad faith. When confronted with something new, we reach for familiar frameworks. Prompt volume feels like keyword volume; prompt share feels like search share. The analogy is intuitive. It's just wrong.
The sooner marketers stop chasing this mirage, the sooner they can focus on what actually moves the needle: understanding and changing how AI models think about their brand.
Careers
Ideas
Legal


