Citation Signals

The measurable content features and metadata that large language models (LLMs) evaluate when deciding which external sources to reference in generated answers. Citation signals encompass a spectrum of criteria—ranging from traditional SEO metrics to AI-specific indicators—each weighted differently by individual models. Key components include:

Domain Authority & Publisher Reputation: The model’s bias toward established, high-trust domains (e.g., major news outlets, academic journals).
Semantic Relevance: Passage-level alignment between the query embedding and the source text embedding, ensuring topical precision.
Recency & Update Frequency: Publication dates and lastmod timestamps, especially critical for models with freshness biases (e.g., Google’s Gemini 2.5).
Structured Data & Schema Markup: Presence of well-formed Article, FAQPage, HowTo, and related schema.org types that facilitate parsable, machine-readable context.
Entity Salience & Density: Clear, unambiguous mention of canonical entities and their variants to boost embedding similarity and disambiguation.
Citation Precision Requirements: The model’s tolerance for unverified or unlinked claims versus demand for inline evidence (notably high in Anthropic’s Claude 3.5).
Link-Graph Popularity: Backlink profiles and cross-domain references, reflecting traditional network-based authority.

Understanding and optimizing for these signals—each weighted uniquely by GPT-4o, Claude 3.5, Gemini 2.5, and other retrieval-augmented models—is the cornerstone of AI-native SEO and “getting cited” in the era of generative search.

Citation Signals

Content Chunking

JSON‑LD Endpoint