NVIDIA Releases Cosmos 3: A Two-Tower Mixture-of-Transformers Foundation Model Unifying Physical Reasoning, World Generation, and Action Generation

NVIDIA AI team have released Cosmos 3. It is a family of omnimodal world models for physical AI. The models combine physical reasoning, world generation, and action generation. All three capabilities live inside one open model. NVIDIA open sourced the checkpoints, training scripts, deployment tools, and datasets. The Cosmos 3 release targets robotics, autonomous vehicles, and warehouse monitoring teams.

Physical AI systems must understand the world before acting in it. Robots and vehicles need to perceive, predict, and then act. Earlier Cosmos releases split these jobs across separate models. Cosmos 3 unifies them with a Mixture-of-Transformers (MoT) architecture. The architecture is built around two towers.

The reasoner tower is a vision-language model (VLM). It interprets images, videos, and text using an autoregressive architecture. It understands motion, object interactions, and other physical context. NVIDIA team describes this tower as the model’s brain.

The generator tower produces future observations and action sequences. It uses a diffusion-based process for physics-aware video and actions. These outputs are conditioned on the reasoner tower’s understanding. Information flows one way, from reasoner to generator. The reasoner can run alone. The generator always activates both towers for guided generation.

NVIDIA Releases Cosmos 3: A Two-Tower Mixture-of-Transformers Foundation Model Unifying Physical Reasoning, World Generation, and Action Generation

NVIDIA Releases Cosmos 3: A Two-Tower Mixture-of-Transformers Foundation Model Unifying Physical Reasoning, World Generation, and Action Generation

Other newsrooms on this story

Related reading

Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning…

Nvidia unveils Cosmos 3 world model to enhance robot navigation

Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3 |…

How Cosmos 3 Helps Physical AI Think Before It Acts

Nvidia's new world model helps robots navigate the world

Introducing Cosmos 3 Edge

Other newsrooms on this story

Related reading

Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning…

Nvidia unveils Cosmos 3 world model to enhance robot navigation

Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3 |…

How Cosmos 3 Helps Physical AI Think Before It Acts

Nvidia's new world model helps robots navigate the world

Introducing Cosmos 3 Edge