Turn the camera away, and the AI's world freezes

Video AI systems consistently fail to track what happens when the camera looks away: when a scene pans away from an object in motion and returns, current models re-render the object in its original position rather than showing the logical result of off-screen change. Scaling to more parameters makes this failure worse, not better, according to WRBench, a new benchmark that tests what researchers call "world model reliability." The benchmark presents AI video systems with scenes where something happens off-screen — the camera pans away while an object is in motion, or while a light changes, or while an open door should stay open — then pans back to see what the system believes should have happened. A system that genuinely models the world would track what occurred during the off-screen interval. Current systems mostly don't.

Key facts

What: A new benchmark tests whether video AI systems can track what happens to parts of a scene the camera isn't currently showing. Across 23 models, the answer is mostly no — and making the models larger made the problem worse, not better.

When: 2026-06-19

Primary source: read the source (arXiv 2606.20545)

Key facts

When: 2026-06-19

Primary source: read the source (arXiv 2606.20545)

Turn the camera away, and the AI's world freezes

Turn the camera away, and the AI's world freezes

Other newsrooms on this story

Related reading

Why AI Models Break Outside The Lab

Can today’s AI video models accurately model how the real world works?

How to Measure Whether AI Video Is Production-Ready: Cost per Usable Clip

T*: Rethinking Temporal Search for Long-Form Video Understanding

Inside An AI Agent: Planning, Tool Use, Memory, Constraints, And Verification

Will Smith eating spaghetti is a benchmark for AI video. How does he look?

Other newsrooms on this story

Related reading

Why AI Models Break Outside The Lab

Can today’s AI video models accurately model how the real world works?

How to Measure Whether AI Video Is Production-Ready: Cost per Usable Clip

T*: Rethinking Temporal Search for Long-Form Video Understanding

Inside An AI Agent: Planning, Tool Use, Memory, Constraints, And Verification

Will Smith eating spaghetti is a benchmark for AI video. How does he look?