Skip to main content
BLOG

Structured data for AI agents: schema.org assistants use

Schema.org and JSON-LD don't force AI to cite you, but they hand ChatGPT, Perplexity, and Google AI machine-readable facts. Types, errors, free 0–100 scan.

By IMozzUpdated 2026-06-06
Structured data for AI agents — aiSiteReady

Schema.org and JSON-LD are not a secret boost for ChatGPT or Google AI. They are something more durable: an explicit, machine-readable layer that tells search and AI systems who you are, what you offer, and how the facts on a page relate to each other. Google says it uses structured data found on the web to understand page content and gather information about the world, and schema.org describes itself as a shared vocabulary for structured data used by search engines and other applications (Google). That is the honest promise — no more, no less.

It helps to picture structured data as the top layer of a stack, not a standalone trick. A bot has to reach the page, then read it without executing your JavaScript, and only then can your JSON-LD add precision on top. Skip a lower layer and the markup stops mattering: perfect schema on a page an assistant can't fetch, or can't parse without a browser, buys you nothing. Schema sharpens a page that is already accessible; it does not rescue one that isn't.

Key takeaways

  • Schema.org/JSON-LD don't force AI to cite you. They reduce ambiguity, so a system extracts your canonical facts instead of inferring a fuzzier version from prose (Google).
  • Google says there are no extra requirements and no special markup to appear in AI Overviews or AI Mode, and calls overfocusing on structured data a myth (Google).
  • But explicit structure can win: Google says publisher-supplied structured data may be preferred over data it extracts automatically (Google).
  • Add the entities assistants must identify first: Organization + WebSite, then SoftwareApplication, Product + Offer, and FAQPage only where a real Q&A is visible.
  • Markup is only one channel. Agents also read raw HTML and the accessibility tree, so schema doesn't replace semantic HTML (web.dev). aiSiteReady checks all of these and scores your site 0–100.

Does schema.org help ChatGPT and Google AI?

The fact-checked answer is: yes, but indirectly, and never as a magic switch. Be precise here, because the topic attracts overclaiming.

For Google's AI features, the documentation is blunt. Google states there are no additional requirements to appear in AI Overviews and AI Mode, that no special "optimizations" are needed, and that ordinary SEO best practices still apply. It explicitly files "you need special markup files" and "you need separate AI optimization" under myths, and says overfocusing on structured data is unnecessary (Google). So you cannot honestly sell schema as a backdoor into AI summaries.

Yet undervaluing it is just as wrong. Google also says explicit clues about a page's meaning help Search understand it, and its "pros and cons" example shows that when a publisher supplies structured data, Google prefers it over data it would otherwise extract automatically (Google). That is the real mechanism: schema doesn't make AI cite you, it raises the odds that a system reads your canonical facts rather than reconstructing a less accurate version from free text.

For ChatGPT, phrase it carefully. OpenAI publishes no "add type X and property Y to rank in ChatGPT Search" checklist. It says ranking depends on many factors, that top positions can't be guaranteed, and that the practical requirement to be surfaced is not blocking OAI-SearchBot (OpenAI). Because ChatGPT Search rewrites queries into sub-queries and can lean on third-party search providers, schema most plausibly helps it through the retrieval systems it depends on — an inference from the architecture, not a stated OpenAI ranking factor.

Why JSON-LD helps AI, but isn't magic

Treat JSON-LD as the most convenient transport layer for meaning, not as an SEO hack. Google recommends JSON-LD precisely because it is easier to deploy and maintain at scale, and the W3C defines it as a JSON-compatible serialization of linked data designed to slot into systems that already use JSON (W3C). It is a standard way to make facts machine-readable, not a trick aimed at language models.

The retrieval picture explains why that matters. Google's own guide to generative AI features describes them as RAG plus query fan-out over the regular Search index: the system retrieves relevant pages, then uses specific passages from those pages to generate an answer with prominent links to sources (Google). The less ambiguous your entities and facts, the more reliably a system can pull the right passage and ground its answer in it.

Two paths compared. A labeled JSON-LD card flows through a solid arrow into an AI node and yields exact extracted facts: name, price, and type read directly. Below it, an unstructured paragraph blob flows through a faded dashed arrow into an AI node and yields only a best guess inferred from prose that may be wrong.

There is a hard limit worth stating plainly: structured data is one channel among several. Google's web.dev guidance says agents perceive a site three ways — screenshots, raw HTML, and the accessibility tree — and OpenAI adds that accessibility and ARIA labels help its ChatGPT agent understand a site's structure and interactive elements (web.dev). The takeaway for builders: schema describes facts about entities; semantic HTML and accessibility describe structure and actions. JSON-LD does not replace either.

Which schema.org types should you add first?

Don't start from "the maximum number of types." Start from the entities an assistant has to recognize and cite without ambiguity. Google's advice is to use the most specific applicable types and property names, and — when a page has several related entities — to connect them with a shared @id (Google).

TypeUse it whenKey properties AI extracts
Organization + WebSiteAlmost every site — the identity layername, url, logo, legalName, sameAs, contactPoint
SoftwareApplicationSaaS, web/desktop apps, tool pagesname, applicationCategory, operatingSystem, offers, aggregateRating
Product + OfferCommerce and catalog pagesname, image, brand, offers.price, offers.priceCurrency, availability
FAQPageA page with a real, user-visible Q&AmainEntityQuestion.name, acceptedAnswer.text

Organization + WebSite is the base identity layer, so an assistant knows who publishes the site, its official URL, what logo to use, and which external profiles confirm that identity. Note the difference between url (the entity's own official URL) and sameAs (external pages that unambiguously confirm it). Use one canonical @id per entity and reuse it.

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "Organization",
      "@id": "https://example.com/#org",
      "url": "https://example.com/",
      "name": "Acme AI",
      "legalName": "Acme AI LLC",
      "logo": "https://example.com/static/logo.png",
      "sameAs": [
        "https://www.linkedin.com/company/acme-ai/",
        "https://github.com/acme-ai"
      ],
      "contactPoint": [
        { "@type": "ContactPoint", "contactType": "sales", "email": "sales@example.com" }
      ]
    },
    {
      "@type": "WebSite",
      "@id": "https://example.com/#website",
      "url": "https://example.com/",
      "name": "Acme AI",
      "inLanguage": "en-US",
      "publisher": { "@id": "https://example.com/#org" }
    }
  ]
}

That shared @id is what lets a consumer treat the Organization and the WebSite as the same connected identity, rather than two unrelated blobs. The same discipline pays off across pages: link the Product, its Organization, and the WebSite through stable identifiers.

Three schema.org entities — Organization, WebSite, and Product — each shown with its own canonical @id, connected by linking arrows into a single coherent graph, contrasted with three disconnected cards that share no id and read as unrelated objects.

SoftwareApplication earns its place on SaaS and app pages, because assistants are often asked very factual questions: is there a free tier, is this web or mobile, what does it cost, who makes it. Useful properties are name, description, applicationCategory, operatingSystem, offers, and aggregateRating. There is no universal "required" list at the vocabulary level — collect the minimum set of facts an agent can extract without reading marketing copy.

{
  "@context": "https://schema.org",
  "@type": "SoftwareApplication",
  "@id": "https://example.com/app/#software",
  "name": "Acme Copilot",
  "url": "https://example.com/app/",
  "applicationCategory": "BusinessApplication",
  "operatingSystem": "Web",
  "offers": { "@type": "Offer", "price": "29", "priceCurrency": "USD" },
  "aggregateRating": { "@type": "AggregateRating", "ratingValue": "4.8", "ratingCount": "127" }
}

Product + Offer is the most measurable type for commerce. For merchant listings, Google requires at least name, image, and offers on the Product, and inside the Offer an active price (price or priceSpecification.price) and currency. availability is recommended, and Google strongly recommends description, brand, and a GTIN because they improve data quality and verifiability (Google).

{
  "@context": "https://schema.org",
  "@type": "Product",
  "@id": "https://example.com/products/widget-pro#product",
  "name": "Widget Pro",
  "image": ["https://example.com/images/widget-pro.jpg"],
  "brand": { "@type": "Brand", "name": "Acme" },
  "sku": "WIDGET-PRO-1",
  "offers": {
    "@type": "Offer",
    "url": "https://example.com/products/widget-pro",
    "price": "199.00",
    "priceCurrency": "USD",
    "availability": "https://schema.org/InStock"
  }
}

For shopping specifically, page-level schema is only part of the story. OpenAI publicly promotes structured product feeds: its Agentic Commerce docs describe merchants submitting a feed that ChatGPT indexes to understand product attributes and show current price and availability, and call those fields a "canonical record" used to display and link the product (OpenAI). So for a catalog, schema.org on the page is a solid base, but a product feed is an even more direct channel into ChatGPT shopping.

FAQPage still works as a strictly structured Q&A format, but in 2026 it needs a caveat. A valid FAQPage is built around mainEntity, where every Question needs a name and an acceptedAnswer, and every Answer needs text. However, Google's current documentation says FAQ rich results stopped showing in Google Search on May 7, 2026, and that FAQ support in the Rich Results Test is being removed in June 2026 (Google). Use it as honest semantic normalization of FAQs your users can actually see — not as a new AI growth lever. And use it only where each question has one official answer; if users can add answers, Google recommends QAPage instead.

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "Is there a free plan?",
      "acceptedAnswer": { "@type": "Answer", "text": "Yes. There is a free tier with core features." }
    }
  ]
}

What mistakes break the value of JSON-LD?

"We added schema and nothing happened" almost always traces to one of these.

  • Broken JSON. The W3C is explicit that a JSON-LD document must always be valid JSON; if the base JSON is malformed, no consumer can reliably read it as linked data (W3C).
  • A fragmented graph. When the same entity is marked up with different @ids — or none — consumers can't tell it's one object. Google says to use @id to connect related items so Search understands, for example, that a video belongs to a specific recipe (Google). One canonical id per entity, reused everywhere.
  • Markup that doesn't match the page. Google warns that structured data must represent the page's main content, must not describe content hidden from users, and must not be misleading; for FAQs it states the content must be visible on the page (Google). Valid-but-invisible markup destroys trust in all your markup.
  • Missing consumer-specific fields. The vocabulary is flexible, but specific consumers define their own required fields. Without name, image, offers, and a valid price + currency, a Product simply isn't eligible as a merchant listing (Google).
  • The wrong page type. Marking a user-driven support thread as FAQPage is a poor model; Google recommends QAPage when users can contribute answers. Give a page the primary type that matches its real subject.
  • Blocked, stale, or JS-only data. Google asks you not to block pages with structured data via robots.txt or noindex, and to keep time-sensitive data current. OpenAI says OAI-SearchBot must be allowed or your site won't appear in ChatGPT search (OpenAI). And if critical facts only exist after client-side JavaScript runs, the many AI crawlers that don't render JavaScript never see them. Put your semantic core and JSON-LD in the initial HTML.

How does structured data connect to AI answers and measurement?

For Google's AI features, sell schema as part of clean technical structure and rich-result eligibility, not as a shortcut into summaries — the docs are clear there are no special requirements (Google). The upside is reduced ambiguity, not guaranteed citation.

For ChatGPT, the link runs through crawl access and citability. OpenAI's publisher FAQ says any public site can appear in ChatGPT search, and that to be discovered, surfaced, and clearly cited you must not block OAI-SearchBot — which is a different control from the GPTBot training crawler (OpenAI). If you're unsure which bots you allow, our guide to controlling AI crawlers in robots.txt untangles the matrix. For agentic browsing, OpenAI says ChatGPT's agent understands sites better when they follow accessibility best practices and use ARIA roles and labels — so interface semantics often matter as much as JSON-LD in the <head>.

Measuring the result got easier, too. On June 3, 2026, Google launched Search Console reports for generative AI features, including visibility in AI Overviews, AI Mode, and generative AI in Discover (Google). OpenAI says publishers that allow OAI-SearchBot can track ChatGPT referral traffic via utm_source=chatgpt.com. Neither proves a specific gain came from schema, but together they let you tie technical-structure improvements to changes in AI visibility and referral traffic.

How do you validate your markup?

Check in three layers, because "valid per schema.org" and "supported by a specific platform" are not the same thing.

  1. Schema Markup Validator for base syntax and vocabulary-level correctness (schema.org).
  2. Google Rich Results Test to see which rich results Google can generate — consumer-specific eligibility.
  3. URL Inspection / Search Console to see the page as Google does, and to confirm indexing, accessibility, and that nothing is blocking the crawler.

The FAQ case shows why the distinction bites: a perfectly valid FAQPage can no longer be sold as a current Google growth lever now that FAQ rich results are gone, and FAQ support is leaving the Rich Results Test. After that, lean on the schema validator and on content-matches-page checks rather than the rich-result preview.

How do you check your whole site at once?

Validating one template by hand is doable; checking every template, on every release, across structured data and the layers beneath it, is not. That is the job aiSiteReady does. It fetches your site the way an agent would and reports on Meta & structured data (JSON-LD, Open Graph), HTML without JavaScript, robots.txt, sitemap.xml, rules for AI crawlers, and protocol discovery like MCP and OAuth — as a 0–100 score with blockers and prioritized fixes.

This maps to the discoverability and protocol checks in the score, alongside the content-accessibility gate that decides whether an agent can read your page at all. The exact checks and weights live on the methodology page, and this guide is the structured-data deep dive behind what AI agent readiness means. Once your pages are readable, you can also hand assistants a curated reading map with llms.txt.

Run a free scan to see whether ChatGPT, Claude, Perplexity, and Google AI can actually read your JSON-LD, whether your content survives without JavaScript, whether robots.txt and sitemap.xml get in the way, and which fixes will move the needle first — in English, Ukrainian, or Russian.

The short version: schema.org won't make assistants cite you, but it makes your site far easier to read, match, and extract facts from correctly — which is exactly what good AI retrieval is built on.

IMozz has 20 years in software development, with the past year spent building with LLMs. He builds aiSiteReady, a read-only scanner that checks whether AI agents can read a site. It server-renders its own content as a working example.

Frequently asked questions

Does schema.org help ChatGPT or Google AI?
Indirectly, and only as one layer. Google's own documentation says there are no extra requirements and no special markup needed to appear in AI Overviews or AI Mode, and it calls overfocusing on structured data a myth. But the same docs say explicit clues help Search understand a page, and that publisher-supplied structured data can be preferred over data Google extracts automatically. OpenAI publishes no 'add type X to rank in ChatGPT' checklist; it says ranking depends on many factors and that you must not block OAI-SearchBot to be surfaced and cited. So schema reduces ambiguity about your facts — it does not flip a citation switch.
Which schema.org types should I add first?
Start with the entities an assistant must identify without guessing, not the longest list of types. For almost every site that means Organization + WebSite (who publishes this, official URL, logo, verified profiles). Add SoftwareApplication for SaaS and apps, Product + Offer for commerce (Google requires name, image, offers, and an active price plus currency inside the Offer), and FAQPage only where a real, user-visible Q&A exists. Google recommends using the most specific applicable type and linking related entities with a shared @id.
Is FAQPage schema still worth adding in 2026?
Only as honest semantic normalization, not as a growth lever. Google's FAQ documentation states that FAQ rich results stopped showing in Search on May 7, 2026, and that FAQ support in the Rich Results Test is being removed in June 2026. A valid FAQPage can still describe genuine, visible questions and answers in a machine-readable way, which can help AI systems extract them. But the FAQ content must be visible to users on the page, and if users can submit alternative answers you should use QAPage instead.
Does structured data replace semantic HTML for AI agents?
No. Google's web.dev guidance says agents perceive a site three ways — screenshots, raw HTML, and the accessibility tree — and OpenAI says ChatGPT's agent understands a site better when it uses accessibility best practices and ARIA roles and labels. JSON-LD describes facts about entities; semantic HTML and accessibility describe structure, actions, and navigation. You need both. Schema is a clarity layer on top of a page that is already crawlable and readable without running JavaScript.