Video AI systems consistently fail to track what happens when the camera looks away: when a scene pans away from an object in motion and returns, current models re-render the object in its original position rather than showing the logical result of off-screen change. Scaling to more parameters makes this failure worse, not better, according to WRBench, a new benchmark that tests what researchers call "world model reliability." The benchmark presents AI video systems with scenes where something happens off-screen — the camera pans away while an object is in motion, or while a light changes, or while an open door should stay open — then pans back to see what the system believes should have happened. A system that genuinely models the world would track what occurred during the off-screen interval. Current systems mostly don't.

Key facts

What: A new benchmark tests whether video AI systems can track what happens to parts of a scene the camera isn't currently showing. Across 23 models, the answer is mostly no — and making the models larger made the problem worse, not better.

When: 2026-06-19

Primary source: read the source (arXiv 2606.20545)