TUTORIALS

Schemas & Source Formatting for AI Extraction

Schemas & Source Formatting for AI Extraction

Schema is scaffolding—not a shortcut. The goal is to make each page’s facts legible and liftable for answer engines and assistants. Use JSON-LD that mirrors the visible page, keep HTML readable without JavaScript, and publish single-intent URLs with unambiguous headings and a visible “last updated.”

Schema is scaffolding—not a shortcut. The goal is to make each page’s facts legible and liftable for answer engines and assistants. Use JSON-LD that mirrors the visible page, keep HTML readable without JavaScript, and publish single-intent URLs with unambiguous headings and a visible “last updated.”

Keller Maloney

Unusual - Founder

Oct 11, 2025

Summary

Use JSON-LD (when feasible) to reinforce what your page already says. Prefer HTML-first rendering so retrieval layers can parse content reliably. Provide HTML twins for important PDFs. Expose freshness (dateModified on-page; lastmod in your sitemap). (Google Search Central: “Intro to structured data” — https://developers.google.com/search/docs/appearance/structured-data/intro-structured-data) (Google Search Central: “Crawling & indexing” — https://developers.google.com/search/docs/crawling-indexing) (Google Search Central: “Build and submit a sitemap” — https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap) (Google Search Central Blog: “Sitemaps lastmod & ping” — https://developers.google.com/search/blog/2023/06/sitemaps-lastmod-ping)

Schema patterns that actually help (by page archetype)

  • FAQ pages → FAQPage with one Q&A per item. Keep answers concise and fact-dense; the JSON-LD must match the visible copy. (Google Search Central: “FAQPage structured data” — https://developers.google.com/search/docs/appearance/structured-data/faqpage) (Schema.org: FAQPage — https://schema.org/FAQPage)

  • How-to / procedural → HowTo with steps and (if applicable) materials. Don’t mark up narratives as HowTo. (Google Search Central: “HowTo structured data” — https://developers.google.com/search/docs/appearance/structured-data/how-to)

  • Articles, explainers, case studies → Article (and subtypes when appropriate). Include headline, author, datePublished, dateModified. (Google Search Central: “Article structured data” — https://developers.google.com/search/docs/appearance/structured-data/article)

  • Product/spec pages (if applicable) → Product with verifiable attributes; only mark up what’s truly on the page. (Google Search Central: “Product structured data” — https://developers.google.com/search/docs/appearance/structured-data/product)

Bing also supports structured data (including JSON-LD) and offers validation in Webmaster Tools. (Bing Webmaster Guidelines — https://www.bing.com/webmasters/help/webmaster-guidelines-30fba23a) (Bing Webmaster Blog: “Introducing JSON-LD Support in Bing Webmaster Tools” — https://blogs.bing.com/webmaster/august-2018/Introducing-JSON-LD-Support-in-Bing-Webmaster-Tools) (Bing Webmaster Help: “Marking up your site with structured data” — https://www.bing.com/webmasters/help/marking-up-your-site-with-structured-data-3a93e731)

Source formatting that models can parse

  • HTML-first rendering. Keep critical copy server-rendered; pages should be legible with JS disabled. Google considers dynamic rendering a workaround, not a long-term solution—prefer SSR/SSG or hydration. (Google Search Central: “JavaScript SEO basics” — https://developers.google.com/search/docs/crawling-indexing/javascript/javascript-seo-basics) (Google Search Central: “Dynamic rendering (workaround)” — https://developers.google.com/search/docs/crawling-indexing/javascript/dynamic-rendering) (Search Engine Land: “Google no longer recommends dynamic rendering” — https://searchengineland.com/google-no-longer-recommends-using-dynamic-rendering-for-google-search-387054)

  • Headings that reflect the task. Use explicit sections like “Definition,” “Comparison,” “Steps,” “References,” and “Known limitations.”

  • One intent per URL. Avoid mixing definition + tutorial + narrative on the same page; assistants lift clearer answers from single-intent pages.

  • PDFs with HTML twins. Google can index PDFs, but you’ll get better freshness signals and extraction fidelity from HTML. (Google Search Central: “File types indexable by Google” — https://developers.google.com/search/docs/crawling-indexing/indexable-file-types)

Freshness & discovery

  • On-page: show a visible “Last updated” and maintain a public changelog.

  • Sitemaps: include accurate lastmod and submit via Search Console (ping endpoint deprecated; rely on lastmod and normal recrawl). (Google Search Central: “Build and submit a sitemap” — https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap) (Google Search Central Blog: “Sitemaps lastmod & ping” — https://developers.google.com/search/blog/2023/06/sitemaps-lastmod-ping)

Minimal JSON-LD templates (copy, then make facts match the page)

FAQPage

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [{
    "@type": "Question",
    "name": "What is [Concept]

Article (case study/explainer)

{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "[Title]",
  "author": [{ "@type": "Person", "name": "[Author]

HowTo

{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to evaluate [X]",
  "step": [{ "@type": "HowToStep", "name": "Step 1", "text": "Do the thing." }]

Only include properties your readers can actually see on the page; mismatched markup erodes trust.

Checklist

One intent per URL • Definition + decision table above the fold • HTML-first (JS optional) • JSON-LD that mirrors visible facts • Visible “last updated” + public changelog • Sitemap with accurate lastmod • HTML twins for important PDFs • Clear “References” section with primary and reputable third-party sources

References

(Google Search Central: “Intro to structured data” — https://developers.google.com/search/docs/appearance/structured-data/intro-structured-data)

(Google Search Central: “FAQPage structured data” — https://developers.google.com/search/docs/appearance/structured-data/faqpage)

(Google Search Central: “HowTo structured data” — https://developers.google.com/search/docs/appearance/structured-data/how-to)

(Google Search Central: “Article structured data” — https://developers.google.com/search/docs/appearance/structured-data/article)

(Google Search Central: “Product structured data” — https://developers.google.com/search/docs/appearance/structured-data/product)

(Google Search Central: “Crawling & indexing” — https://developers.google.com/search/docs/crawling-indexing)

(Google Search Central: “File types indexable by Google” — https://developers.google.com/search/docs/crawling-indexing/indexable-file-types)

(Google Search Central: “JavaScript SEO basics” — https://developers.google.com/search/docs/crawling-indexing/javascript/javascript-seo-basics)

(Google Search Central: “Dynamic rendering (workaround)” — https://developers.google.com/search/docs/crawling-indexing/javascript/dynamic-rendering)

(Search Engine Land: “Google no longer recommends dynamic rendering” — https://searchengineland.com/google-no-longer-recommends-using-dynamic-rendering-for-google-search-387054)

(Google Search Central: “Build and submit a sitemap” — https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap)

(Google Search Central Blog: “Sitemaps lastmod & ping” — https://developers.google.com/search/blog/2023/06/sitemaps-lastmod-ping)

(Bing Webmaster Guidelines — https://www.bing.com/webmasters/help/webmaster-guidelines-30fba23a)

(Bing Webmaster Blog: “Introducing JSON-LD Support in Bing Webmaster Tools” — https://blogs.bing.com/webmaster/august-2018/Introducing-JSON-LD-Support-in-Bing-Webmaster-Tools)

(Bing Webmaster Help: “Marking up your site with structured data” — https://www.bing.com/webmasters/help/marking-up-your-site-with-structured-data-3a93e731)

The Unusual Feed

The Unusual Feed

The Unusual Feed

INSIGHTS

One-Size-Fits-None: Why Your Content Strategy Needs Two Separate Tracks

For the last two decades, we've accepted an uncomfortable compromise: content that tries to please both humans and search engines ends up underwhelming both. Now there's a third constituency—AI models—and the compromise is untenable.

INSIGHTS

One-Size-Fits-None: Why Your Content Strategy Needs Two Separate Tracks

For the last two decades, we've accepted an uncomfortable compromise: content that tries to please both humans and search engines ends up underwhelming both. Now there's a third constituency—AI models—and the compromise is untenable.

INSIGHTS

The Newest Job in Marketing: AI Psychologist

Marketing’s new audience is AI itself: people now start buying journeys by asking models like ChatGPT, Gemini, and Perplexity, which act as influential intermediaries deciding which brands to recommend. To win those recommendations, brands must treat models as rational, verification-oriented readers—using clear, specific, and consistent claims backed by evidence across sites, docs, and third-party sources. This unlocks a compounding advantage: AI systems can “show their work,” letting marketers diagnose how they’re being evaluated and then systematically adjust content so models—and therefore buyers—see them as the right fit.

INSIGHTS

The Newest Job in Marketing: AI Psychologist

Marketing’s new audience is AI itself: people now start buying journeys by asking models like ChatGPT, Gemini, and Perplexity, which act as influential intermediaries deciding which brands to recommend. To win those recommendations, brands must treat models as rational, verification-oriented readers—using clear, specific, and consistent claims backed by evidence across sites, docs, and third-party sources. This unlocks a compounding advantage: AI systems can “show their work,” letting marketers diagnose how they’re being evaluated and then systematically adjust content so models—and therefore buyers—see them as the right fit.

INSIGHTS

Are AI Models Capable of Introspection?

Turns out that they can. Anthropic’s 2025 research shows advanced Claude models can sometimes detect and describe artificial “thoughts” injected into their own activations, providing the first causal evidence of genuine introspection rather than post-hoc storytelling—about a 20% success rate with zero false positives. The effect is strongest for abstract concepts and appears to rely on multiple specialized self-monitoring circuits that emerged through alignment training, not just scale. While this doesn’t prove consciousness, it demonstrates that leading models can access and report on parts of their internal state, with significant implications for interpretability, alignment, and how we evaluate future AI systems.

INSIGHTS

Are AI Models Capable of Introspection?

Turns out that they can. Anthropic’s 2025 research shows advanced Claude models can sometimes detect and describe artificial “thoughts” injected into their own activations, providing the first causal evidence of genuine introspection rather than post-hoc storytelling—about a 20% success rate with zero false positives. The effect is strongest for abstract concepts and appears to rely on multiple specialized self-monitoring circuits that emerged through alignment training, not just scale. While this doesn’t prove consciousness, it demonstrates that leading models can access and report on parts of their internal state, with significant implications for interpretability, alignment, and how we evaluate future AI systems.