When Verification Isn't Enough

On April 18, 2026, a partner at Sullivan & Cromwell submitted a court filing containing dozens of errors, many of them fabricated case citations generated by AI. The mistake was not caught internally. It was identified by opposing counsel and entered into the public record.

The incident is not unusual. The legal profession has been documenting similar cases for years. What makes this one significant is not that it happened, but where it happened.

Sullivan & Cromwell is one of the most prestigious law firms in the United States. The firm has policies governing AI use. Lawyers are trained on its limitations. They are instructed to "trust nothing and verify everything."

And the filing went out anyway.

What Failed

The standard approach to AI adoption in high-value domains is clear: train users on limitations, require verification of outputs, and treat AI as assistive rather than authoritative. This is the model Sullivan & Cromwell had in place. The assumption behind it is that reliability can be enforced at the point of use. As long as the human verifies, the system's errors are contained. That assumption does not hold under real conditions.

The Scaling Problem

AI systems generate large volumes of output. The more they are used, the more there is to review. Verification scales linearly with output. The value of AI comes from reducing the time spent producing that output. At a certain point, the system produces more content than can be meaningfully verified without eroding the productivity gains it was meant to create. The result is a structural tradeoff. Verify everything, and the benefit is lost. Verify selectively, and risk is accepted. At scale, neither is stable.

A Pattern Across Industries

This dynamic is not limited to law. In healthcare, AI systems have been deployed under claims of extremely low error rates that later proved difficult to substantiate. In financial services, major firms now disclose hallucination as a material risk alongside cybersecurity threats. These are not isolated failures. They reflect a shared constraint: the cost of error is high, and the volume of output is higher.

What This Reveals

The problem is not that AI systems occasionally produce incorrect answers. The problem is that reliability depends on external verification. A system whose outputs must be checked line by line is not reliable at scale. It is conditionally useful, with risk shifted to the user. What failed in the Sullivan & Cromwell case was not policy, training, or intent. What failed was the assumption that verification could compensate for system behavior.

Where the Problem Actually Lives

Most current approaches to improving AI focus on better training data, larger models, and stricter output filters. These are necessary. But they operate either before a response is generated or after it is produced. They do not address how the system behaves while producing the response itself.

A Different Framing

This is not an intelligence problem. It is a behavior problem during generation. AI systems are capable of producing correct outputs. What is not reliable is how consistently those outputs are produced across real use. A system that can produce correct answers but cannot be relied on to do so without external verification does not meet the requirements of high-value professional use.

What This Means for Adoption

AI will continue to be adopted. The economic pressure is too strong to ignore. But adoption at scale depends on a shift in where reliability lives. As long as reliability depends on the user, the system cannot scale into domains where error is costly. For AI to be usable in those environments, reliability has to move into the system itself.

A Simple Conclusion

Capability is not the limiting factor. Behavior is. AI systems do not fail because they lack intelligence. They fail because their behavior is not reliably governed during generation.

We agree. So we did something about it.

This perspective is informed by ongoing work at XyloIQ on how AI behavior can be stabilized and governed as responses are formed.

Reference: Nerkar, S. "A.I. 'Hallucinations' Created Errors in Court Filing, Top Law Firm Says." The New York Times, April 21, 2026.

‍