...

How do AIs choose their sources?

Table of contents

You may have already asked ChatGPT, Gemini or Perplexity a question and received a well-structured, well-sourced, almost reassuring answer. But how do these tools decide what to cite and what to leave out? This is no trivial matter. It determines what millions of people read every day without ever clicking on a link.

AI doesn’t browse the web in the same way you do. It doesn’t read a page from top to bottom, nor does it form a general impression of the site. It breaks things down, evaluates them, compares them, and then puts them back together. Understanding this process is essential, whether you’re simply a curious reader or someone who produces online content.

AI: how it breaks down and evaluates content

Contrary to what you might think, a Generative AI does not analyse an entire page as a coherent whole. It breaks it down into self-contained blocks, or ‘chunks’, which it then evaluates independently of one another. A clear and well-written passage may therefore be selected, even if it appears in the middle of a page that is otherwise average. Conversely, an entire page that ranks highly on Google may be completely ignored if its content appears too promotional or too vague.

This approach fundamentally changes what matters. Whereas a traditional search engine might, by default, rank a decent page in the absence of a better alternative, a generative AI will simply disregard it if it judges that another snippet provides a better answer to the question asked. Tolerance for inaccurate content is virtually non-existent.

Even before this assessment takes place, the AI must first be able to crawl your content. Crawling errors, technically inaccessible pages and excessively long loading times block content ingestion at an early stage, before the question of reliability even arises. A snippet that takes too long to retrieve will simply be abandoned in favour of a more responsive source.

The tangible signs that build trust

Once the content has been made accessible and broken down into segments, the AI looks for objective indicators to assess its reliability. The first of these indicators is the presence of sources themselves. According to an analysis published by HubSpot, citing one’s own sources improves the visibility of content by more than 130 % in the responses generated by chatbots. This figure ties in with a principle that Google has long applied to its own search ranking: trust remains a key criterion, whether we’re talking about a traditional search engine or a generative one.

The second factor relates to the perceived authority of the content. A clear demonstration of expertise, with well-supported rather than vague claims, would increase the visibility of a piece of content by nearly 90 % in these same responses, according to the same analysis. You can therefore see why a text that merely makes assertions without ever explaining them struggles to convince an AI, just as it would struggle to convince a discerning reader.

The third indicator is more unexpected: the human dimension of the text. Writing that conveys a voice, a nuance, almost a perceptible emotion, seems more likely to have been written by a person than by a machine, and this counts in the assessment. This point is worth bearing in mind, as it serves as a reminder that AI is not merely seeking cold, hard facts; it is also looking for content that resembles embodied human expertise.

Why the structure of a text is just as important as its content

Beyond the content itself, the form plays a decisive role in how an AI assesses whether content is usable. Headings phrased as genuine questions or direct statements, clear lists highlighting key points in the argument, and tables summarising several options: all of these make it easier to extract a specific passage, leaving no ambiguity as to what it demonstrates.

Technical markup complements this work. Structured data – particularly in Schema.org format – acts as a bridge between your content and an AI’s understanding of it. It specifies who wrote the text, what type of content it is, and which organisation it is associated with. Add consistent Schema.org markup on your most strategic pages remains one of the most direct technical ways to facilitate this recognition.

AI also builds its confidence based on consistency observed across multiple sources. If the same expertise appears in several media outlets, several specialist blogs and across several different formats, the signal of reliability is automatically strengthened. A scattered and inconsistent presence, on the other hand, weakens this same signal, even if each piece of content, taken in isolation, appears sound.

What this actually means for you

If you produce online content, this development deserves your attention, not your concern. A study cited in several industry analyses estimates that less than five per cent of responses generated by generative AI still result in a click through to the original source. Traffic is shifting, but brand awareness continues to be built elsewhere – within the response itself, which your potential audience reads.

This mechanism calls for a rethink of certain habits. Content designed solely to please a ranking algorithm, lacking any real depth, loses its relevance when faced with an AI that assesses each piece of content on its own clarity and rigour. Structure each section around a specific and comprehensive question This then becomes a more useful writing habit than simply stringing together keywords.

You don’t need to rush to rewrite everything. But ensuring that your existing content cites identifiable sources, clearly answers a question rather than beating about the bush, and has a recognisable voice is already a solid step towards remaining visible in this new way of searching for information.

Reliability that remains largely unclear

We must, however, be honest about the limitations of this understanding. Unlike the rules of traditional SEO, which are extensively documented by Google itself through its EEAT recommendations, generative models do not explain precisely how they select their sources. The signals described here are based on observations and empirical studies, not on comprehensive official documentation.

This partial lack of transparency calls for caution rather than absolute certainty. By its very nature, AI remains a probabilistic system that weighs up numerous signals simultaneously, without any external party having a complete picture of the exact weighting. Bearing this nuance in mind helps to avoid turning observed trends into immutable rules.

AI: key takeaways

For generative AI, reliability cannot be reduced to a single ‘magic’ criterion. It is built on a range of indicators: the technical accessibility of the content, the presence of verifiable sources, a demonstration of genuine expertise, a structure that facilitates information extraction, and consistency observed across multiple platforms.

This requirement essentially reflects what you yourself would expect from a text before trusting it. AI simply formalises, on a large scale and using its own tools, a verification instinct that you probably already apply – without always realising it – every time you assess the credibility of a piece of information.

Sources

  • HubSpot, GEO (Generative Engine Optimisation): an ally or a replacement for SEO?
  • Traffic Makers, GEO vs SEO 2026: Dominating AI-powered search results
  • My Little Big Web, Generative Engine Optimisation, 2026 Guide
  • WAM Agency, Generative Engine Optimisation: everything you need to know about GEO
  • Toonetcreation, GEO SEO: Understanding Generative Engine Optimisation


Share


Subscribe
Notify of
guest
0 Comments

My web host international preferred (-80% with this link) 👇

Roger Ari
Roger Ari
has just commented
Yes, they're up to date :)
creation 01 creation 02 creation 03