This study examines whether JSON-LD schema markup independently predicts the probability that a web page will be cited in AI-generated responses. We collected 730 AI citations from ChatGPT (GPT-4o with web browsing) and Gemini (1.5 Pro with search grounding) across 75 commercial queries spanning five categories: SaaS and Technology, Health and Medical, Finance and Insurance, Professional Services, and How-To and DIY. Google top-10 organic results for the same queries were collected via SerpAPI as a control set, yielding 1,006 total unique pages analyzed for schema characteristics and domain authority (Ahrefs DR).
Read the Full Paper
The complete preprint is available on the following platforms
Fischman, K. (2026). Does schema markup predict AI citation? A cross-platform empirical study of structured data and generative engine optimization. Preprint.
https://doi.org/10.5281/zenodo.18728697
Initial pooled analysis produced a significant negative association between schema presence and AI citation (OR = 0.546, p < .001) — suggesting schema actively reduced citation probability. This finding proved to be a methodological artifact: Google's ranking algorithm systematically enriches top-10 organic results for schema-bearing pages, inflating schema prevalence in the non-cited control population. A within-Google diagnostic revealed that schema prevalence among AI-cited and non-cited Google pages was statistically indistinguishable (43.1% vs. 44.8%), collapsing the apparent effect entirely. Corrected models using Generalized Estimating Equations with query-clustered standard errors produced a null result for schema presence (OR = 0.678, p = .296), entity richness score (OR = 1.001, p = .833), and schema-to-query alignment (OR = 1.068, p = .626).
The dominant predictor of AI citation was Google organic rank position (OR = 0.762 per position, p < .001). Position-1 pages were cited in 43% of queries in which they appeared, declining to 5% at position 7. This gradient implies that each rank position reduces citation odds by approximately 24%, and that AI citation behavior is substantially mediated by the search backend ranking that precedes AI-level content evaluation.
One significant exception emerged: pages implementing Product or Review schema with populated concrete attribute fields — pricing, aggregateRating, specifications — were cited at substantially higher rates than pages implementing generic schema types such as Article, Organization, or BreadcrumbList (61.7% vs. 41.6%, p = .012). This attribute-rich advantage was most pronounced among lower-authority domains (DR ≤ 60), consistent with the interpretation that factual payload in structured data partially compensates for weak authority signals. Sophisticated entity-linking techniques — Wikidata sameAs links, genuine @id cross-referencing — appeared on fewer than 4% of schema-present pages and could not be evaluated statistically.
These findings support a more precise version of the schema-helps hypothesis than the practitioner consensus has articulated: attribute-rich schema that provides extractable factual content may confer modest citation advantages for lower-authority domains, while generic schema provides none. The dominant practical implication is that traditional organic rank position remains the primary lever for AI visibility, and that GEO-specific optimization efforts are most productive when directed at content quality and authority rather than generic structured data implementation. Keywords: AI search optimization, schema markup, JSON-LD, structured data, large language models, AI citation, generative engine optimization, GEO, ChatGPT, Gemini, entity-graph, rank position
Download to keep reading ↗️
Cite this paper
@techreport{growthmarshalgm2026001,
author = {Fischman, Kurt},
title = {Does Schema Markup Predict AI Citation?},
institution = {Growth Marshal},
year = {2026},
number = {GM-2026-001},
url = {https://www.growthmarshal.io/research/does-schema-markup-predict-ai-citation}
}
Kurt is the CEO & Founder of Marshal, a Managed AI Delivery service that helps SMBs make AI operational. He builds agentic workflows and AI visibility systems that power modern growth.