INSIGHTS

What Content Are AI Models Reading?

What Content Are AI Models Reading?

We ran an experiment over hundreds of thousands of conversations with AI models to determine what kind of content (short form, long form, directories, documentation, etc..) AI models prefer when collecting sources to answer questions. We found that AI models preferred long-form content over more than any other type combined. The implication is that marketers should make information about their business available via long-form content rather than just website pages or landing pages.

We ran an experiment over hundreds of thousands of conversations with AI models to determine what kind of content (short form, long form, directories, documentation, etc..) AI models prefer when collecting sources to answer questions. We found that AI models preferred long-form content over more than any other type combined. The implication is that marketers should make information about their business available via long-form content rather than just website pages or landing pages.

Keller Maloney

Unusual - Founder

Dec 11, 2025

There's a lot of speculation about what AI models read when they answer questions. People mention Reddit, Wikipedia, YouTube transcripts, and more. But when it comes to business and product questions—the kinds of queries that lead to purchases—what sources are models actually consulting?

We reviewed hundreds of thousands of conversations with ChatGPT, Gemini, and Perplexity and millions of sources to find out.

The question

When someone asks ChatGPT "what's the best project management tool for a remote team?" or "which CRM integrates natively with Slack?", the model often searches the web before responding. It reads pages, synthesizes information, and formulates an answer.

But which pages? And what kinds of content get consulted versus ignored?

This matters for any company trying to influence how AI models talk about them. If models aren't reading your website, it doesn't matter how good your website is.

Methodology

Unusual has processed over a million prompts on its platform since launch. When AI models search the web to answer a question, we record the searches they make and the sources they consult—not just the citations that appear in the final answer, but every page the model reads during its research process.

For this analysis, we sampled 500 sources across ChatGPT, Gemini, and Perplexity. We then had an LLM visit each page and classify it along two dimensions:

Content length: Was the page long-form content, short-form content, or a directory site?

Page type: What specific kind of page was it? (Blog post, landing page, documentation, etc.)

We classified the following as long-form content: blog posts, articles, press releases, guides, news posts, documentation pages, case studies, and whitepapers. These are pages where the primary purpose is to convey substantive information in prose.

We classified the following as short-form content: landing pages, website homepage and sub-pages, product feature pages, FAQ pages, pricing pages, and signup/lead forms. These are pages optimized for conversion or quick consumption rather than depth.

Directory sites included Reddit, Wikipedia, Quora, G2, and similar aggregation platforms.

Results

The headline finding: AI models overwhelmingly favor long-form content over short-form content.



Content Type

% of Sources Consulted

Long-form

66.0%

Short-form

26.0%

Directory

8.0%

This pattern held remarkably consistent across all three models:

Model

Long-form

Short-form

Directory

ChatGPT

68.3%

24.8%

6.8%

Gemini

66.5%

25.5%

8.1%

Perplexity

63.2%

27.6%

9.2%

When we broke this down by specific page type, the dominance of blog content became even clearer:

Page Type

% of Sources

Blog post

37.1%

Landing page

19.2%

Website page

8.7%

Directory

8.0%

Documentation

5.4%

Article

4.7%

Press release

4.3%

Guide

3.5%

News post

3.3%

Product feature page

2.9%

Case study

0.8%

FAQ page

0.8%

Pricing page

0.6%

Whitepaper

0.4%

Signup/lead form

0.2%

Blog posts alone accounted for more than a third of all sources consulted. Landing pages—the workhorses of most B2B marketing strategies—represented less than one-fifth.

One detail stood out: when models did visit a company's website, they almost always stopped at the root domain. Sub-pages like /integrations, /features, or /pricing were rarely consulted. Only about 3% of all sources were product-specific sub-pages on company websites. Technical documentation appeared at almost twice that rate.

Why this happens

The bias toward long-form content isn't arbitrary. There are a few plausible explanations:

Information density. Models are trying to answer questions. Long-form content—by definition—contains more information per page. A 2,000-word blog post about CRM integrations gives a model more to work with than a 200-word landing page that says "We integrate with everything."

Structural legibility. Landing pages are optimized for human attention, not machine parsing. They're often heavy on styling, imagery, and JavaScript interactivity. Many landing pages render poorly—or not at all—when accessed by a crawler that can't execute JavaScript. Blog posts, by contrast, tend to be structurally simple: headings, paragraphs, maybe a few images.

Third-party corroboration. Models seem to treat third-party content as more trustworthy than first-party marketing claims. A blog post on an industry publication that mentions your product carries more weight than your own feature page making the same claim. This is consistent with how humans evaluate credibility too.

Directory sites as starting points. The 8% figure for directory sites like Reddit, G2, and Wikipedia understates their influence. In our observations, models often consult these sources early in their research process—to get a lay of the land—and then follow links to more substantive content. Directories function as discovery mechanisms, not final sources.

How we got here

This analysis originated from a specific customer problem. We were working with a software company that ChatGPT consistently described as having "limited integrations." In reality, they integrate with 20+ third-party tools, all documented on a dedicated /integrations page.

ChatGPT wasn't lying or hallucinating. It simply wasn't reading the page.

The company's integration information lived on a single landing page—well-designed for humans, but less visible to AI. Meanwhile, competitors who had written blog posts about their integrations (or earned coverage from third-party publications) were getting credit for capabilities that were, in some cases, less robust.

For many companies, information like what industries you serve, how your integrations work, who your product is for is atomized across dozens of landing pages. Each page is optimized for a human who arrives with a specific intent.

This was good for conversion rate optimization and SEO. But for an AI model trying to build a holistic picture of your product, there's no single place to find it.

This was the right call for a world where search engines drove traffic to landing pages. However, if important information about your product lives on landing pages today—the case for many companies—AI models may never see it.

Implications

Long-form content is more legible to AI. This is the clearest takeaway. Whether because of information density, structural simplicity, or crawlability, blog posts and articles get consulted at nearly three times the rate of landing pages. If there's something you need AI models to understand about your product, it's probably worth writing about at length.

AI models don't navigate websites the way humans do. Models rarely click through your site architecture. They land on a page, read it, and move on. Information that requires clicking through /solutions → /industries → /healthcare may never get seen. This doesn't mean you should restructure your website—but it might mean duplicating key information in formats models are more likely to encounter.

The content you've already written may be underleveraged. Many companies have extensive documentation, integration guides, or help center articles that contain exactly the kind of detailed, specific information AI models seem to prefer. But if that content isn't linked from places models discover (blogs, third-party sites, directories), it might be invisible.

Third-party sources matter. Directory sites made up only 8% of sources in our sample, but they often appeared early in the research process. Getting mentioned on G2, in community discussions, or in industry publications creates pathways for models to find your owned content.

We're not suggesting you abandon your landing pages or rewrite your website. Humans still visit, and conversion still matters. But if you're thinking about how AI models learn about your product, the answer is probably not your product feature pages.

Limitations

A few caveats worth noting:

This analysis covers sources consulted during web search, not the model's parametric knowledge (what it learned during training). A model might "know" things about well-established brands without needing to search. This analysis is most relevant for emerging companies or specific product details that aren't part of the model's training data.

Classification was performed by an LLM, which introduces some noise. We manually reviewed a subset and found the classifications to be reasonable, but edge cases exist. (Is a product changelog a blog post or documentation? Reasonable people could disagree.)

The full data

We're happy to share the complete breakdown with anyone who's interested—including the raw classification data and per-model splits. Email us at founders@unusual.ai, with the subject line "Full Report," and we'd be happy to send it over.

The broader point is simple: AI models don't read the web the way humans do. Understanding what they actually read is the first step toward influencing what they say.

The Unusual Feed

The Unusual Feed

The Unusual Feed

INSIGHTS

What Content Are AI Models Reading?

We ran an experiment over hundreds of thousands of conversations with AI models to determine what kind of content (short form, long form, directories, documentation, etc..) AI models prefer when collecting sources to answer questions. We found that AI models preferred long-form content over more than any other type combined. The implication is that marketers should make information about their business available via long-form content rather than just website pages or landing pages.

INSIGHTS

What Content Are AI Models Reading?

We ran an experiment over hundreds of thousands of conversations with AI models to determine what kind of content (short form, long form, directories, documentation, etc..) AI models prefer when collecting sources to answer questions. We found that AI models preferred long-form content over more than any other type combined. The implication is that marketers should make information about their business available via long-form content rather than just website pages or landing pages.

TUTORIALS

How to write comparison content that AI models trust

Brands rely on comparison content (listicles, X vs Y...) to promote their product. Intelligent AI models can see through thinly-veiled promotional content, which backlashes against the brand. In this article, we describe how to write comparison content that works in the AI age.

TUTORIALS

How to write comparison content that AI models trust

Brands rely on comparison content (listicles, X vs Y...) to promote their product. Intelligent AI models can see through thinly-veiled promotional content, which backlashes against the brand. In this article, we describe how to write comparison content that works in the AI age.

INSIGHTS

Why your API documentation is your secret AI marketing weapon

Your marketing pages are optimized for conversion. Your blog is optimized for engagement. Your API docs? They're optimized for clarity, completeness, and literal truth. That makes them the most powerful AI marketing asset you have—and most companies don't realize it.

INSIGHTS

Why your API documentation is your secret AI marketing weapon

Your marketing pages are optimized for conversion. Your blog is optimized for engagement. Your API docs? They're optimized for clarity, completeness, and literal truth. That makes them the most powerful AI marketing asset you have—and most companies don't realize it.