The Retrieval Delusion

The Retrieval Delusion

A recent New York Times investigation into Google's AI Overviews surfaced an important signal. The system often retrieves the correct sources. It does not always produce the correct answer. In more than half of cases studied, the system generated accurate responses without relying on the sources it cited. In other cases, it retrieved the correct information and still produced incorrect outputs. The retrieval worked. The failure happened during generation.

What This Reveals

The dominant assumption in AI over the past two years has been that hallucinations are a data problem. The dominant approach has been Retrieval-Augmented Generation - the idea that providing models with reliable external information will improve the quality of their outputs. If a model produces incorrect answers, the reasoning goes, it must not have the right information. Provide reliable sources, and the problem should diminish. The data suggests otherwise.

A Different Failure Mode

When a system generates an answer, it is not simply repeating retrieved information. It is constructing a response. Retrieved documents are one signal among many. They compete with prior patterns, internal associations, and context accumulated during the interaction. The relationship between retrieval and output is not deterministic. Even when the correct information is present, it may not be what ultimately gets expressed.

What Retrieval Cannot Solve

The most visible failure cases illustrate this clearly. A system can retrieve a correct source and still generate an incorrect answer. It can retrieve a weak or misleading source and treat it as authoritative. It can produce a correct answer while failing to ground it in the information it presents. These are not retrieval failures. They are failures of how the system decides what to produce.

The Ceiling of the Current Approach

The most important signal in the reporting is not any individual error. It is the pattern. As systems become more capable, their outputs are not always constrained by the information they are given. Internal capability can diverge from external grounding. This creates a ceiling. You cannot reliably improve outputs by improving inputs alone.

A Simple Conclusion

This is not a data problem. It is a behavior problem during generation. Providing better information does not ensure that information is used correctly. The question is not only what the system knows, or what it retrieves. It is how it behaves as it produces a response.

Where This Leads

Retrieval improves access to information. It does not govern how that information is used. Until that layer is addressed, improving data and retrieval will continue to have diminishing returns.

We agree. So we did something about it.

This perspective is informed by ongoing work at XyloIQ on how AI behavior can be stabilized and governed as responses are formed.

Reference: Mickle, T., Metz, C., Freedman, D., Mondría Terol, T., and Collins, K. "How Accurate Are Google's A.I. Overviews?" The New York Times, April 7, 2026.

‍