INSIGHTS

Prompt Volume Data Is Stolen And Useless

Prompt Volume Data Is Stolen And Useless

Marketing teams should avoid solutions built on prompt volume data. It's attractive at first glance, but unethical, misleading, and useless.

Marketing teams should avoid solutions built on prompt volume data. It's attractive at first glance, but unethical, misleading, and useless.

Keller Maloney

Unusual - Founder

Apr 19, 2026

The "real prompt data" that AEO and GEO vendors sell as a window into what buyers are actually asking ChatGPT is scraped from a small set of users who unwittingly installed spyware Chrome extensions that recorded their ChatGPT conversations and sold them on the dark web. That is the clearest conclusion from [security research published this month](https://www.koi.ai/blog/urban-vpn-browser-extension-ai-conversations-data-collection).

Beyond the ethical violation, the data is also incomplete. It samples the wrong users, captures only fragments of the conversations they have, and omits the user's intent. It is bad data, and teams who rely on it will draw bad conclusions.

The takeaway is clear: marketers evaluating AI brand perception tools should avoid solutions built on prompt volume data. Those that don't will waste effort optimizing against an illusion of buyer behavior that the dataset cannot show.

## Secret prompt harvesting

Koi Security found that Urban VPN, a Chrome extension with over 6 million users and a "Featured" badge in both the Chrome and Edge stores, has been secretly harvesting every AI conversation its users have had since a July 2025 silent auto-update. Across Urban VPN and its sister extensions from the same publisher, over 8 million users are affected.

The harvested data includes prompts, model responses, and timestamps across ChatGPT, Claude, Gemini, Copilot, Perplexity, DeepSeek, Grok, and Meta AI. The extension intercepts this traffic and ships it to a data broker called BiScience, which packages and resells it as "marketing analytics." Users never consented. The store listings claimed user data was not sold to third parties. It was.

AI search analytics tools buy the data from BiScience and sell it to their customers under the guise of "understand what your buyers are asking AI".

## Why they need to steal prompt data

Real AI users aren't opting in to have their most candid AI conversations recorded at scale. The author of the Koi security article puts it well: "I'd developed a level of candor with my AI assistant that I don't have with most people in my life."

OpenAI doesn't release prompt data. Neither does Anthropic, Google, or any other major AI provider. There is no legitimate, at-scale source of real user prompts for purchase.

The data exists because someone took it; any vendor claiming otherwise is working from data collected without the user's knowledge.

## Even if the data were clean, it would be useless

Even if the data had been collected with full consent and a clean audit trail, it would still not answer the question it is being sold to answer. There are three major flaws with the data itself that render it effectively useless for building a strategy upon.

**The sample is biased.** The dataset isn't what people ask ChatGPT. It's what people who installed a shady VPN extension ask ChatGPT. The sophistication of that group almost certainly doesn't match an enterprise buyer.

**The data is too narrow.** These datasets count specific prompts, not entire conversations. This is not how buyers actually interact with AI. The average ChatGPT conversation is eight messages long. Buyers start broad, add constraints like budget, integrations, and team size, and only accept a recommendation once the model has absorbed their full context. A dataset of one-off prompts can't reconstruct the conversation around it.

**You can't see the user's context.** AI models personalize their responses to the user asking the question. If 1,000 people ask an AI model, "What car should I buy?", the model will respond differently to each person (it might recommend an SUV to a family with kids, a sports car to a single person). Even with a perfect list of buyer prompts, you cannot see what ChatGPT is telling those buyers about you, because the answer changes every time the question is asked.

## The implication

Marketing teams should avoid solutions built on prompt volume data. It's attractive at first glance, but unethical, misleading, and useless.

If you're evaluating an AI brand tracking tool and the pitch depends on access to real user prompts at scale, ask how the data was collected, and how it was sampled.

There is no good answer. That's why prompt volume is a dead end.