Microsoft Research's Mirage gives video generation a persistent spatial memory that doesn't forget what's around the corner

Mirage, a video world model from Microsoft Research and several universities, stores scene information directly in latent space instead of pixel-based point clouds. That slashes compute time and graphics memory while keeping scenes spatially consistent through long camera moves. It still can't reliably track moving objects across segments.

domenica 14 giugno 2026 New tab

Mirage is a new video world model that skips the costly detour through pixel-based memory. That speeds up generation and keeps a scene's spatial structure stable even during long camera moves. Researchers from several universities built it with Microsoft Research.

Video world models turn a starting frame and a camera path into plausible moving images, handy for simulations or as world simulators. But without some kind of memory, even strong generators lose track of space over time. A corner of a room you've already passed looks different when the camera swings back. Furniture shifts, and textures change.

Systems like Voyager, WonderWorld, and Spatia try to fix this with a 3D point cloud that gets fed a steady stream of color data. Every new generation step has to render that cloud and then translate the result back into the model's internal feature space. Microsoft's new paper calls this a double bottleneck: It eats compute, and information leaks out every time the data passes through pixel space.

Mirage takes a different approach. Rather than holding onto visible color points, it stores the internal image features the diffusion model already uses. Each feature gets a spot in 3D space, which turns it into an entry in spatial memory.

Microsoft Research's Mirage gives video generation a persistent spatial memory that doesn't forget what's around the corner

Microsoft Research's Mirage gives video generation a persistent spatial memory that doesn't forget what's around the corner

Other newsrooms on this story

Related reading

Adobe Research Unlocking Long-Term Memory in Video World Models with…

Big AI firms pump money into world models as LLM advances slow

Google’s new AI model creates video game worlds in real time

Google DeepMind integrates Street View with Project Genie for immersive…

New AI model turns photos into explorable 3D worlds, with caveats

Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for…

Other newsrooms on this story

Related reading

Adobe Research Unlocking Long-Term Memory in Video World Models with…

Big AI firms pump money into world models as LLM advances slow

Google’s new AI model creates video game worlds in real time

Google DeepMind integrates Street View with Project Genie for immersive…

New AI model turns photos into explorable 3D worlds, with caveats

Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for…