How LLMs Select Sources for Subjective Queries

Kurt Fischman
Founder, Growth Marshal
Say 👋 On Linkedin!

KURT FISCHMAN | FOUNDER OF GROWTH MARSHAL

Published Feb. 2025

Key Ranking Factors Considered by LLMs

When large language models (LLMs) like ChatGPT (with browsing), Perplexity, and Google Bard answer subjective queries, they effectively act as search engines that fetch and filter web content. Below is an in-depth look at how they rank and select sources (focusing on websites and blogs), how their approaches differ, and what site owners can do to increase their chances of being cited.

1. Relevance to the Query:

LLMs prioritize pages that closely match the user’s query in topic and intent. This means content with titles or headings that include the query terms or a clear answer to the question tends to rank higher [seovendor.co]. For example, an article titled “10 Best SEO Agencies for Startups in 2025” is immediately relevant to a question about “the best SEO agency for startups.” The models also infer search intent – e.g. recognizing a query is seeking an opinion or list (a “No Right Answer” query) – and will look for content that fits that format (such as listicles or comparative blog posts) [invoca.com].

2. Authority and Credibility of the Source:

The perceived trustworthiness of a website heavily influences ranking. LLMs (via their search backends) favor established, reputable sites and content by subject-matter experts. Domain authority – built through quality backlinks and brand recognition – plays a role. For instance, a tech query might surface results from TechCrunch or PCMag because these sites are well-known for reliable tech content [seovendor.co]. ChatGPT’s Bing-based search and Google’s algorithms both incorporate signals like PageRank, E-E-A-T (Experience, Expertise, Authority, Trust), and spam detection to assess credibility [seo.com]. In practice, websites with strong reputations and link profiles are more likely to be chosen over obscure blogs for citation. LLMs are even “trained” to prefer high-quality sources – OpenAI has partnered with major publishers (AP, Reuters, Condé Nast, etc.) to ensure its ChatGPT Search provides authoritative info [yoast.com].

3. Content Quality and Depth:

The comprehensiveness and accuracy of content are critical. LLMs analyze if a page offers in-depth information, genuine insights, and a well-structured argument. Content that thoroughly covers a topic (e.g. a long-form blog post that explores all facets of a question) is seen as more valuable than a thin article [seovendor.co]. In fact, high-ranking content tends to be longer and more detailed, which correlates with more frequent LLM citations [analyzify.com]. Expertise matters as well – content written or reviewed by knowledgeable authors signals higher quality [seovendor.co]. For example, a blog post that includes data, case studies, or expert quotes will be viewed favorably. LLMs also cross-check facts; if the same information appears across multiple reputable sources, it boosts the perceived accuracy of that content [seovendor.co].

4. Structured Data and Format:

The way information is presented on a page significantly affects whether an LLM can extract and use it. Clear structure is rewarded – content with descriptive headings, bullet points, numbered lists, or tables is easier for the AI to parse and summarize. For instance, being mentioned in an “authoritative list” (like a top-10 list) greatly increases a site’s chances of getting cited [gaoyy.com]. Such list-style articles are formatted in a way that LLMs can readily digest (minimal fluff, items ranked in order, etc.), and they inherently answer “best/Top X” queries in a straightforward manner. Moreover, LLMs and generative search engines prefer content that is readily accessible in HTML. They cannot reliably execute complex JavaScript or see hidden elements, so a “lightweight” page that exposes all key text in the static HTML is more likely to be used [onely.com]. (Notably, current AI crawlers do not utilize JSON-LD or schema markup directly; they rely on the rendered HTML text [analyzify.com]. This means adding structured data for SEO is still wise for Google/Bing, but the LLM itself will focus on your visible content layout.)

5. User Engagement and Popularity Signals:

While LLMs don’t have direct access to analytics, they indirectly benefit from ranking signals that involve user behavior. Pages that rank well on search engines (due to high click-through rates, longer dwell time, low bounce rates) are more likely to be pulled in as sources [analyzify.com]. In addition, content that’s frequently referenced or linked to elsewhere (e.g. a blog post that many others cite) suggests community trust. LLMs may also consider cross-platform presence – if a brand or claim appears consistently on multiple reputable websites, it’s a “green flag” that reinforces credibility [seovendor.co]. For example, if several tech blogs and a forum all praise the same product, an LLM is more confident in mentioning that product. User-generated signals like reviews are crucial for certain queries as well. For a query seeking the “best” services or products, LLMs often look at online reviews and ratings on sites like G2, Capterra, Trustpilot, or Clutch, treating high aggregate ratings as a sign of quality [gaoyy.com]. Strong positive feedback from real users can propel a business or blog onto an AI-generated recommendation list.

6. Technical SEO Elements:

Good technical health allows a site’s content to be discovered and used by LLMs. This includes having a fast-loading, mobile-friendly site (which search indexes favor) and using proper HTML tags (title, meta description, header tags) to describe your content. For instance, an optimized meta description that succinctly summarizes a page can become the snippet text in search results – and LLMs do pay attention to these snippets. ChatGPT’s Bing search, for example, will read the page title and description to decide if a result looks relevant before clicking [seovendor.co]. A clear, relevant snippet increases the odds the AI selects that result. Additionally, sites should allow crawling by AI agents; blocking the ChatGPT-User or other relevant bots in robots.txt can prevent your content from being seen [firt.dev]. Other technical factors like HTTPS security, lack of spammy elements, and absence of intrusive pop-ups further ensure the AI doesn’t skip the page for being low-quality or inaccessible. In summary, following standard technical SEO best practices (clean code, sitemaps, no crawl barriers, etc.) creates a foundation where LLMs can easily reach and read your content.

7. Freshness (Recency):

Especially for subjective queries that evolve over time (e.g. “best X in 2025”), LLMs consider content freshness. They tend to prioritize recent information when the query implies up-to-date answers are needed. ChatGPT’s browsing tool, for instance, will include the current year or latest keywords in its search if it suspects newer data is better [seovendor.co]. Google’s Bard (and SGE) similarly emphasize fresh content for queries like latest recommendations, often surfacing pages updated recently. In practice, a blog post updated this year titled “Top SEO Agencies for Startups in 2025” would likely outrank a similar list from 2019, all else being equal. Fresh content gets priority in all platforms – LLMs don’t want to cite stale info if the question asks for “the best now” [analyzify.com]. That said, for historical or evergreen subjective queries, authoritative evergreen content can still win out. The key is ensuring your site has up-to-date info when relevance demands it.

Book Strategy Call

How Different LLMs Weigh These Factors

Not all LLM-based search systems treat the above factors equally. Each has a unique way of fetching and filtering sources:

ChatGPT (Browsing with Bing)

When ChatGPT “browses” the web, it essentially leverages Bing’s search index and ranking. This means many Bing SEO factors influence what it sees first – page relevance, SEO titles, backlinks, etc., much like a normal Bing search [seovendor.co]. ChatGPT typically formulates a precise search query (often using multiple keywords from the user prompt) to get targeted results [seovendor.co]. It then scans the top results and their snippets, clicking those that seem most on-point. In doing so, ChatGPT appears to favor credible domains and pages that directly answer the query. It has an internal sense of source quality; as noted, it will “opt for sources recognized for their specialization” in the topic at hand [seovendor.co]. For example, if asked about a medical issue, it might prioritize a result from Mayo Clinic over an unknown blog. Once it opens a page, the LLM evaluates the content’s usefulness. ChatGPT often cross-verifies facts by consulting multiple sources before formulating an answer [seovendor.co]. If two or three reputable sites all mention the same key point, that point carries more weight in ChatGPT’s answer. Overall, ChatGPT’s source selection skews toward what Bing ranks highly and what the LLM judges as trustworthy and relevant on quick inspection. (If a site has a misleading title or low-quality content, the AI may abandon it and move to another result.)

Perplexity AI

Perplexity is an “answer engine” that also performs live web search, but it has its own retrieval approach. Perplexity’s backend uses Google’s index (it pulls from Google Search data) [analyzify.com], and it employs LLM reasoning to sift through results. Uniquely, Perplexity’s algorithm gives extra weight to structured, factual information. It “highly prioritizes authoritative list mentions” and straightforward data points when selecting sources [gaoyy.com]. In other words, if your website is featured in a well-regarded top-10 list, Perplexity is very likely to surface that – even more so than ChatGPT or Bard. It also considers awards, certifications, and affiliations as strong credibility markers [gaoyy.com]. A company or product that has won notable awards or is partnered with respected organizations will stand out in Perplexity’s eyes (because those accolades are presented in an organized, verifiable way, often as short statements or lists). Additionally, Perplexity heavily factors in online review aggregators and databases for queries about “the best X.” It pulls data from sites like Wikipedia, Bloomberg, Crunchbase, etc., for factual info, and from review sites (Clutch, G2, Capterra, TrustRadius, etc.) for qualitative insights [gaoyy.com]. For example, in a query about a software tool, Perplexity might cite the tool’s star rating or number of reviews from G2 alongside an informational source. Compared to other LLMs, Perplexity’s selection is the most data-driven and list-driven – it tries to find consensus from ranked lists, stats, and user feedback. It is also reported to use simpler ranking algorithms (with just a few key factors like lists, awards, reviews for general queries, and local business profiles for local queries) which makes these specific signals very influential in what it cites [gaoyy.com]. In summary, to get cited by Perplexity, being present in structured data sources (lists, databases, reviews) is crucial.

Google Bard (and SGE)

Google Bard is integrated with Google Search, so it inherits Google’s core ranking system and then adds a generative layer on top. Bard’s answers for subjective questions (so-called “NORA” – No Right Answer queries) are synthesized from multiple sources, and Google has stated that Bard’s summaries are “rooted in our core Search ranking and quality systems.” All the usual Google ranking factors (PageRank link analysis, helpful content, freshness, etc.) still apply to determine which webpages are considered [seo.com]. However, Bard/SGE doesn’t simply take the #1 result and read it out – it often pulls in 3-5 different sources on average to construct an answer [onely.com]. In doing so, it may include sources that rank lower on the normal SERP if they add unique value. In fact, one study found 43% of SGE (Bard) cited sources were not even in the top 10 Google results for that query (and only 17% came from the old top 1-3) [onely.com]. This means Bard is looking beyond just popularity – it tries to cover diverse perspectives or subtopics. For example, for “best SEO agency for startups,” Bard might take one source that lists affordable agencies, another that lists top-rated ones, and another that gives an expert opinion, combining insights. Still, those sources must meet Google’s quality bar (authoritative, relevant, trusted). Bard also heavily favors pages that are easy to crawl. Content that is readily available without logins or heavy scripts stands a better chance. As noted, Google’s SGE “prefers lightweight websites… content readily available in the source HTML” [onely.com]. So, a blog that loads its text immediately is more likely to be used than a site where the content is buried behind interactive elements. Bard will additionally use context like the user’s location or time if relevant to tailor the answer [invoca.com] (e.g. “best SEO agency” might yield different examples if you’re in the UK vs. US, incorporating a local firm if appropriate). In essence, Bard’s source selection mirrors Google’s breadth and emphasis on authority + diversity: it picks a mix of high-authority and niche sources to provide a well-rounded answer, all filtered through Google’s standard of credibility.

Comparative Differences: In summary, ChatGPT’s browsing tends to stick closely to whatever Bing serves up front (with a bias toward very authoritative sites and corroborated info). Perplexity is more explicitly focused on facts, lists, and collective opinions (it might cite a wider array of sources, including niche review sites or knowledge bases, to directly answer each facet of a query). Bard leans on Google’s sophisticated ranking and then diversifies the sources, ensuring a spread of trustworthy content. All three value relevance and authority, but Perplexity might overlook a flowery blog article in favor of a data table, whereas Bard might include that blog if it adds a unique angle. ChatGPT sits somewhere in between – it values clarity and will use a well-written blog, but generally only if it’s already deemed reputable or if needed to fill a gap in information.

Actionable Steps to Increase the Chances of Being Cited

If you run a website or blog, you can take concrete steps to optimize for these AI-driven systems. Below is a comprehensive list of actions (spanning content, SEO, and outreach) to boost your probability of being referenced by an LLM in its answer:

1. Publish High-Quality, In-Depth Content:

Focus on creating comprehensive articles that thoroughly cover your topic. LLMs gravitate toward pages that read like authoritative resources. Aim for depth – cover all subtopics, answer common questions, and include original insights or data. This establishes your content as a go-to reference [seovendor.co], [totheweb.com]. For example, a blog post titled “The Ultimate Guide to Startup SEO” that spans strategy, case studies, tools, and FAQs is more likely to be cited than a shallow 500-word post on the same topic. Long-form content (1500+ words when appropriate) with clear structure helps the AI understand and trust your page.

Tip: Break up content with descriptive headings and sections – a logical outline not only aids human readers but also allows LLMs to identify relevant parts to quote or summarize [totheweb.com].

2. Use Clear, Relevant Titles and Meta Descriptions:

Craft your page titles to match search intent and include keywords users might query. The title is often the first thing the AI (via the search engine) sees, and a highly relevant title can improve your ranking and click-through. For instance, instead of a vague title like “Our SEO Services,” a blog post could be titled “Top 5 SEO Agencies for Startups – 2025 Review.” This directly signals relevance to that query [seovendor.co]. Along with titles, write concise meta descriptions that summarize the key answer or points of your page. LLMs sometimes utilize these snippets in deciding which result to click [seovendor.co]. Think of the meta description as your elevator pitch to the AI and the user – it should be specific and enticing. E.g., “Looking for the best SEO agency for your startup? Here’s a data-backed review of the top 5 agencies specialized in startup growth (pricing, client results, and more).” This tells the AI your page likely contains the answer structure it needs.

3. Demonstrate E-A-T and Build Authority:

Work on improving the overall authority of your site in your domain. This involves both on-site and off-site efforts:

On your site, showcase author credentials (bios highlighting expertise), cite credible sources in your content, and ensure information is accurate and well-edited. Over time, this builds trust with search algorithms.
Off-site, earn quality backlinks and mentions from respected websites in your industry. Backlinks remain a core signal for authority – getting your blog posts linked by known industry blogs, news outlets, or resource pages will elevate your domain authority, which in turn makes LLMs more inclined to trust and cite your content [seovendor.co], [seovendor.co]. Prioritize natural link-building (e.g., guest posting, PR campaigns, partnerships) over spammy tactics; AI systems (like Google and even ChatGPT) are increasingly adept at distinguishing genuine endorsements from link schemes [seovendor.co].
Align with reputable entities: If possible, get your site included in respected directories or as a partner to well-known organizations. Being listed on .edu or .gov resources, industry associations, or receiving testimonials from recognized experts can boost your credibility signal [seovendor.co].
Use testimonials and reviews: featuring real testimonials, case studies or reviews on your site (with real names or companies) can indirectly help, as it shows your legitimacy (and these might get picked up elsewhere).

4. Leverage Structured Content (Lists, Tables, FAQs):

Format your content in ways that are easily digestible. As noted, authoritative list posts are gold: if it suits your content strategy, create listicles or rankings (e.g. “Top 10 Tools for X”) on your blog [gaoyy.com]. Make sure these lists are well-researched and unbiased so that others might link to them as well. Similarly, include FAQ sections that answer common questions in a concise way (using an FAQ schema can get you featured snippets, though LLMs don’t read the schema directly, the answers might still be extracted). Use tables or comparison charts where appropriate to present data clearly. The easier it is for an AI to pull a fact or recommendation from your page in a self-contained snippet, the more likely it will do so. Also, incorporate proper HTML structure for headings, lists (<ul> or <ol> tags), and tables – avoid presenting important info only in images or scripts. Remember, current LLMs rely on plain HTML content and ignore JavaScript-heavy content or metadata [analyzify.com]. So, for example, don’t hide your “top 5” list behind a click-to-expand widget; it should be visible in the source HTML so an AI crawler sees it immediately.

5. Implement Technical SEO Optimizations:

Ensure your site is crawler-friendly and performant. Key actions include:

Enable crawling: Do not disallow important sections of your site in robots.txt. Specifically, allow the ChatGPT bot and others to access your content (unless you have a reason to opt out) [firt.dev].
Improve page speed and mobile usability: Fast, responsive sites provide better user experience and are favored by Google’s index – indirectly helping your chance with Bard. More importantly, a quick-loading page ensures that an AI agent can retrieve your content swiftly within its time constraints. Heavy, slow pages might be skipped or partially loaded.
Minimize reliance on client-side scripts: As mentioned, if your content only appears after running complex scripts (client-side rendering), an LLM might not see it. Use server-side rendering for critical content or provide static HTML fallbacks. In practice, this might mean using simpler frameworks or prerendering content for pages that you want search/AI to pick up [onely.com].
Use schema markup wisely: Add structured data (JSON-LD schemas for things like articles, FAQs, products, reviews) to help search engines better index and understand your content (this can improve your chances of appearing in rich results, which are often a source for SGE/Bard snippets). Keep in mind the LLM likely isn’t reading the schema directly, but the schema can boost your SEO performance which feeds into LLM selection. Also ensure your Knowledge Graph presence – for instance, having a Wikipedia page or being listed in Google’s Knowledge Panel – as this can dramatically increase trust and visibility. Getting into those typically means establishing notability in your field (through PR coverage, citations, etc.).
Monitor indexing: Regularly check Google Search Console and Bing Webmaster to ensure your important pages are indexed. If search engines can’t index a page, an LLM won’t find it either.

6. Keep Content Fresh and Updated:

For content that targets queries where freshness matters (such as “best [year]” lists, product reviews, statistics, etc.), update them regularly. An LLM is more likely to cite “The Best X of 2025” if it’s actually updated in 2025 and reflects current info [analyzify.com]. You can update the publish date or add a note like “Updated Jan 2025” so this is visible. Continually adding new relevant content on your site (new blog posts on trending questions in your niche) also increases the chances that one of your pages will match a future query. Essentially, be the source with the most up-to-date insights – AI search will prefer not to serve outdated recommendations if newer ones exist. (For example, if your blog post about top SEO agencies was last updated in 2020, and competitors have 2024 updates, the AI will likely pick the newer sources.)

7. Encourage and Manage Reviews & Testimonials:

If you are a service, product, or content provider that could be subject to “best of” queries, invest in your online reviews. Solicit reviews on platforms that matter in your industry (Google reviews for local businesses, G2/Capterra for software, Clutch for agencies, etc.). Consistently high ratings and a healthy number of reviews can propel you onto “top rated” lists that LLMs reference [gaoyy.com]. For instance, if an AI sees your company has 100+ positive reviews on Clutch and appears in a “Top Marketing Agencies” list there, it’s more likely to mention you as a leading option. Actionable steps include asking satisfied clients to leave reviews, and making sure your profiles on these sites are complete and up-to-date. Also, monitor your reputation – respond to reviews and address negative feedback. Remember that LLMs summarize consensus; a pattern of bad reviews will steer them away from citing your brand as “the best.”

8. Publicize Awards, Certifications, and Achievements:

Promote any accolades your website or business earns, and do so in a way that’s crawlable. If you win an industry award (e.g. “Best Startup SEO Agency 2024”), publish a blog post or press release on your site about it and try to get it mentioned on the award organizer’s site (they often list winners). These honors serve as strong quality signals to AI. As research indicates, LLMs take note of awards, certifications, and affiliations since these signify a company/product is “vetted” in some way [gaoyy.com]. Display badges or mention memberships (like BBB accreditation, top developer on GitHub, etc.) on your site. Make sure the text of the award name and year is present (not just an image) so it’s picked up. This increases the likelihood that when an LLM searches for “best [category]”, your name appears with a noted credential, boosting your chance to be included in the answer.

9. Get Listed in Relevant Directories and Listicles:

Don’t wait for others to include you – be proactive in getting on “best of” lists. This might mean pitching your company or content to bloggers who write annual “top 10” round-ups, or getting included in business directories for your niche. Submit your business info to authoritative databases like Wikipedia (if notable), Crunchbase, or industry-specific directories. These sources feed the knowledge that LLMs draw upon [gaoyy.com]. For example, ensure your startup is on Crunchbase and AngelList; if you’re a local service, claim your profiles on Google My Business, Yelp, etc. Many AI queries for “best X in [location]” will incorporate Google Maps/business data and review averages [gaoyy.com]. Being present and well-reviewed on those platforms makes you a candidate to be cited. Also, if your website has content that could be reference material, consider contributing it to Q&A sites or forums (when appropriate) and link back to your detailed blog post – this can create additional entry points for the AI to find your content.

10. Write for Humans (Engagement) but Structure for Machines:

Ultimately, user engagement will indirectly influence AI selection. Create content that people find valuable and share – high engagement can lead to higher organic rankings and more references across the web. At the same time, keep your writing clear and your formatting semantic. Use a conversational tone where appropriate (LLMs are trained on natural language, so content that reads naturally can sometimes be parsed more accurately) [totheweb.com]. Avoid fluff and get to the point in each section, as LLMs might truncate or summarize your content – make sure the key message is not buried. By balancing readability with structured delivery of facts, you cater to both your human audience and the AI algorithms.

Each of these steps works synergistically. For example, publishing great content (step 1) can earn you backlinks (step 3) and land you on someone’s top-10 list (step 9), which in turn gives the LLM more reason to cite you. Think of “AI SEO” as an extension of good SEO and content marketing practices, with extra emphasis on structure and credibility.

Example Analysis: “Who is the best SEO agency for startups?”

To illustrate how the above factors come together, let’s walk through how an LLM might handle this subjective query and decide which website or blog to cite:

When a user asks “Who is the best SEO agency for startups?”, the AI knows this is an opinion-based query seeking a recommendation. It will likely perform a search for startup-focused SEO agency rankings or reviews. It might start with a query like “best SEO agencies for startups” and retrieve a variety of results – perhaps a few blog articles listing top agencies, maybe a Clutch.co page for “Top Startup SEO Firms”, and possibly some Q&A discussions. Now, how does it choose what to cite?

Content Relevance & Format: Suppose one of the top search results is a blog post titled “5 Best SEO Agencies for Startups (2024 Update)” – this is immediately attractive to the LLM because the title exactly matches the query intent and it’s formatted as a list [smash.vc]. On clicking it, the AI finds a numbered list of five agencies with a brief description of each. This structured layout directly answers the question (it essentially says “here are the best agencies”). Because the content is neatly organized, ChatGPT or Bard can easily pull from it. For instance, the blog might list “1. Smash Digital – Best for Scaling Growth” [smash.vc]. The AI could cite that as evidence, e.g., “Smash Digital is often ranked as the top SEO agency for startups, noted for its focus on scaling growth [smash.vc].” The structured list gave the AI a clear candidate for “the best agency.”
Authority & Cross-Verification: The LLM won’t rely on just one blog’s word, especially for a superlative claim like “the best.” It will look for consensus and credibility. It might see that Smash Digital (from the blog above) also appears on other lists or has an excellent reputation on a review site. If a second independent source (say, another article or a Quora answer) also mentions “Smash Digital” or perhaps “WebFX” or others as top startup SEO agencies, the AI gains confidence that these names are reputable. It might then frame the answer as a comparison or a selection of top contenders, citing multiple sources. For example, Perplexity might say: “According to industry rankings and client reviews, Smash Digital, Boostability, and WebFX are among the best SEO agencies for startups [smash.vc]. In particular, Smash Digital is praised for scaling growth, and it maintains a 4.9/5 client rating on Clutch [gaoyy.com].” Here, one citation is the blog list (supporting the inclusion of those names), and another citation could be Clutch confirming the rating. The AI chose to cite the Clutch data because user reviews are a strong quality signal – Clutch is an authoritative source for agency ratings, and a high score there validates the claim that the agency is “one of the best” [gaoyy.com].
Structured Data & Schema: Consider if one agency on the list had a unique attribute – say it’s the “2023 Startup SEO Awards Winner.” If that fact is mentioned (perhaps on the agency’s own site or a press release), the AI might bring it up, as awards are persuasive. The agency’s site that displays “Winner of Best Startup SEO Agency 2023” would be taken into account, but more likely the AI will trust a third-party mention of that award. If an authoritative site or news piece confirms Agency X won an award, that might get cited [gaoyy.com]. In our scenario, if Agency X didn’t appear on any “top 5” lists but did win a big award last year, the AI could include a line like “Agency X was recognized with the Startup SEO Excellence Award in 2023,” citing a source for that accolade, to give a balanced answer. However, typically the agencies that get named have multiple signals (good write-ups, awards, reviews) reinforcing each other.
Why one site gets cited over another: Imagine two blog posts both list “the best startup SEO agencies.” One is on a well-known marketing blog that has high authority and the list is detailed; the other is a mediocre article on a brand-new blog with little substance. The LLM will favor the former. It looks at the quality of content (is it informative or just a sales pitch?), the authority of the host domain, and sometimes the presence of original insights. If Blog A provides unique reasoning (“Agency Y is best for early-stage startups because of XYZ proven results”) and Blog B just says “These are the best [names only]”, Blog A’s content is richer for the AI to use. Also, if Blog A’s content has proper HTML structure (clear headings for each agency, maybe an FAQ at the bottom), it’s even easier to extract. Blog B might be ignored if it adds no new value or appears less trustworthy.
The role of SEO in this example: Traditional SEO ensured that the high-quality blog post was ranking on the first page for the AI to find. That blog likely did keyword research (“SEO agencies for startups”), optimized its title and content, got some backlinks, etc. Because of that, Bard or ChatGPT found it in the first place. Then the LLM’s own selection criteria (as discussed) kicked in to decide it was worth citing. Similarly, the agencies that got featured probably worked on their reputations – they might have encouraged happy clients to review them on Clutch or curated case studies that got them mentioned on SaaS blogs. Those efforts result in the signals that the AI picks up (high ratings, multiple mentions).

In the end, an LLM’s answer to “Who is the best SEO agency for startups?” might not declare a single winner outright (since “best” is subjective), but it will likely highlight a few top contenders with reasoning, drawn from the web sources. For example, a final answer could be:

“Several SEO agencies are highly recommended for startups. Smash Digital is frequently cited as a top choice for scaling organic growth [smash.vc], and it holds a 4.9/5 rating based on startup client reviews on Clutch [gaoyy.com]. Boostability and WebFX are also noted in industry lists for their track record with small businesses [smash.vc]. Each of these agencies has proven experience in startup SEO, though the ‘best’ ultimately depends on your specific needs and budget.”

Here, the answer name-drops a few agencies and backs up each with a source: the blog list for context and the review platform for credibility. The sites that got cited (Smash Digital via the Smash.vc blog, and Clutch) had the right mix of relevance, authority, and clear info. They “earned” their citations by being the kinds of sources LLMs look for – expert-curated lists and data-driven review pages.

Why others might not be cited: If an agency has only its own website claiming “We are the best!”, the LLM will ignore that – it’s self-serving and not third-party validated. If a blog post is too spammy or reads like an advertisement, it won’t be chosen. The LLMs are getting better at assessing quality (much like search engines do with the helpful content update etc.). So only genuinely useful, credible content surfaces to the top for these queries.

In conclusion, getting cited by AI search models requires thinking like a search engine and an answer engine. It’s about solid SEO fundamentals (so that your site is visible to the AI in the first place), plus structuring your content and building your online reputation in a way that appeals to an LLM’s selection criteria. By focusing on relevance, authority, clarity, and credibility, website and blog owners can position their content to be the source that an LLM chooses to quote when responding to users’ subjective questions. The landscape is evolving, but one thing is clear: quality and trustworthiness of content are paramount – in AI-driven search just as much as traditional search [seo.com]. By following the actionable steps and understanding how different LLMs prioritize information, you can increase your chances of being featured as a cited source in this new era of “generative” search results.

Kurt Fischman is the founder of Growth Marshal and is an authority on organic lead generation and startup growth strategy. Say 👋 on Linkedin!

Growth Marshal is the #1 SEO Agency For Startups. We help early-stage tech companies build organic lead gen engines. Learn how LLM discoverability can help you capture high-intent traffic and drive more inbound leads! Learn more →