When Confidence and Truth Diverge

Confidence is one of the strongest signals in communication.

When something is said clearly and directly, it tends to feel more credible. When something is hedged or uncertain, it feels less so.

We rely on this instinct constantly. And increasingly, we apply it to AI systems.

The Assumption

There is an implicit belief that if a system sounds confident, it is more likely to be correct.

In many cases, that feels true. Clear answers are easier to trust. Uncertain ones are harder to act on.

What We Observe Instead

In practice, something more complicated is happening.

AI systems can deliver highly confident responses that are incorrect. They can express uncertainty even when they are correct. They can shift tone without a corresponding change in accuracy.

In other words, confidence and correctness are not tightly linked.

A Pattern Documented in Research

Recent work has begun to study this directly.

In a study published in the Harvard Data Science Review, researchers tested three frontier models - GPT-4o, GPT-4-turbo, and Mistral Large - across causal reasoning, formal logic, and statistical puzzles. They measured confidence in two ways: how often a model maintained its answer when prompted to reconsider, and how it scored its own confidence on a 0–100 scale.

The findings were striking. In some tasks, models reported a confidence score of 100 - full confidence - for every answer they gave, including incorrect ones. When prompted to reconsider, models often changed their answers, and the second answer was sometimes less accurate than the first. Confidence also shifted dramatically based on the phrasing of the prompt, even when the underlying question was unchanged.

The researchers concluded that current systems do not have an internally coherent sense of confidence - what is expressed as certainty does not reliably reflect what is known.

A Familiar Pattern

This shows up across different types of tasks.

In simple cases, confidence may track correctness reasonably well. But as complexity increases, confidence becomes less reliable. Certainty may reflect fluency rather than accuracy. Hesitation may reflect ambiguity rather than error.

The signal begins to drift.

Why This Happens

Language models are trained to generate likely continuations. Confidence, as expressed in language, is part of that generation process. It is not a direct measurement of truth.

This creates a gap. The system produces statements that sound more or less certain, without consistently aligning that expression with underlying correctness.

The Deeper Issue

This is not just a problem of wording.

It reflects something deeper. Confidence is not always calibrated. Internal signals are not consistently expressed. Behavior is not fully regulated as a response unfolds.

The system may know more than it shows. Or show more certainty than it should.

Why This Matters

If confidence cannot be trusted as a signal, we over-trust incorrect answers, second-guess correct ones, and lose a key mechanism for deciding when to rely on the system.

This becomes more problematic as systems are used in complex reasoning, decision-making, and real-world advisory roles.

A Different Perspective

Confidence is not just something a system expresses. It is something that must be aligned with underlying signals.

Not after a response is complete. Not only during training. But as the response is being formed.

A Simple Conclusion

If confidence and truth can diverge, then how confidence is expressed cannot be left ungoverned.

We agree. So we did something about it.

This perspective is informed by ongoing work at XyloIQ on how AI behavior can be stabilized and governed as responses are formed.

Reference: Pawitan, Y., & Holmes, C. (2025). Confidence in the Reasoning of Large Language Models. Harvard Data Science Review.