Confidence
Hallucinations
Control

What LLMs Know But Don't Show

Most explanations of AI hallucinations assume the same thing. The model doesn't know the answer, so it makes one up.

Recent research suggests this assumption is incomplete.

In work presented at ICLR 2025, researchers examined what large language models encode internally about the truthfulness of their own outputs. The findings point to a different kind of gap.

The Core Finding

The researchers found that internal signals within the model could often predict whether an answer would be correct - even in cases where the model consistently generated the wrong response.

In other words: the signal is present. The behavior is not aligned with it.

What This Suggests

This reframes part of the hallucination problem.

In many cases, the model is not missing the information. The correct answer exists within the system's internal state. It simply does not reach the output.

Why This Happens

Language models are optimized to generate likely continuations. Likelihood and truthfulness are related, but not identical.

When they diverge, generation tends to favor what is more likely - rather than what is more accurate.

This is not a failure of knowledge. It is a behavioral pattern at the point where the response is produced.

What This Implies

If relevant signals already exist internally, the question changes.

It is no longer only: how do we get better information into the model? It becomes: how do we ensure the model acts on the information it already has?

A Simple Conclusion

If models already know more than they show, the path forward is not only to teach them more. It is to ensure that what they already know shapes what they actually say.

We agree. So we did something about it.

This perspective is informed by ongoing work at XyloIQ on how AI behavior can be stabilized and governed as responses are formed.

##

Reference: Orgad, H., Toker, M., Gekhman, Z., Reichart, R., Szpektor, I., Kotek, H., Belinkov, Y. (2025). LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations. ICLR 2025.

Read More Articles