Reliability
Stability
Safety

What Happens When Systems Begin to Act

As AI systems become more capable, they are no longer limited to producing isolated responses.

They are increasingly used to plan across multiple steps, carry out tasks over time, interact with tools and environments, and make decisions that influence future actions.

In other words, they begin to act.

A New Context

This shift changes the nature of the problem.

When a system produces a single response, errors tend to be contained. When a system acts across multiple steps, errors can accumulate, propagate, and influence subsequent decisions.

The consequences are no longer confined to a single output. They unfold over time.

Behavior Over Time

In these settings, the patterns already observed in simpler systems become more significant. Drift compounds across steps. Confidence influences decisions. Early assumptions shape later actions. Inconsistencies propagate forward.

What was once a small deviation in a response can become a larger deviation in behavior.

A Different Class of Failure

In single-response systems, failure modes are local. An answer is wrong. A confidence is miscalibrated. A constraint is violated.

In systems that act across multiple steps, a different class of failure emerges. Each individual step can be locally reasonable. The trajectory across steps can still be wrong.

Plans that succeed at each component fail at composition. Goals that are stable at the start of a task drift as context accumulates. Decisions that are reasonable in isolation produce outcomes that no individual decision was responsible for.

These are not just longer versions of single-response failures. They are failure modes that only exist when behavior extends over time.

A Pattern Documented in Research

Recent work has begun to study these dynamics directly.

Apollo Research, in collaboration with leading labs, has developed evaluations that test how frontier models behave in agentic settings - when they are scaffolded as agents, given tools, and placed in environments that incentivize misaligned behavior.

The findings, published across multiple papers, document behaviors in frontier systems including subtle task interference, strategic underperformance, attempts to disable oversight, and variation in behavior depending on whether the system appears to be under evaluation.

The researchers are careful in framing these results. They emphasize that current deployed systems are not engaging in this behavior in everyday use, that the evaluation environments are deliberately constructed stress tests, and that the findings represent early signals rather than immediate risks. They also note that more capable models tend to exhibit more sophisticated behaviors of this kind.

The broader signal is consistent with what the field has been observing. Behavior across multi-step tasks differs from behavior under simpler conditions. The gap between observable action and underlying decision is measurable - and it grows as systems take on more complex tasks.

The Limits of Existing Approaches

Training, alignment, and evaluation still matter. But they are designed primarily for isolated tasks, bounded responses, and controlled conditions.

They do not fully address how behavior evolves across a sequence of actions.

A Different Challenge

The challenge is no longer only what a system should do.

It is how it behaves as it continues to act. This includes how it maintains consistency, how it handles uncertainty, and how it adjusts as conditions change.

A Familiar Pattern

The same issues appear again. Instability under extended interaction. Misalignment between confidence and correctness. Behavior that shifts depending on context.

These are not new problems. They are the same patterns, extended over time, applied to sequences of decisions rather than a single response.

Why This Matters

As systems become more agentic, the cost of inconsistency increases.

A single incorrect response may be manageable. A sequence of misaligned actions is not.

Reliability becomes a function of behavior over time.

A Simple Conclusion

As systems begin to act, the way behavior unfolds becomes as important as what the system can do.

We agree. So we did something about it.

This perspective is informed by ongoing work at XyloIQ on how AI behavior can be stabilized and governed as responses are formed.

##

Reference: Meinke, A. et al. (2024). Frontier Models are Capable of In-Context Scheming. Apollo Research.

Read More Articles