TL;DRAI

NVIDIA introduces World-Action Models (WAM), robot policies combining pretrained video prediction with action generation in a unified framework. Physical-AI foundation models moving from research to production reshape infrastructure and compute budget priorities.

Jun 15, 2026

Quick glossary for readers new to VLA/WAM terminology

VLA Vision-Language-Action model: a robot policy that starts from a pretrained VLM backbone and adapts it to generate actions from visual observations and language instructions. Large-scale VLM pretraining is a core part of the recipe. See Pi-0 and GR00T N1.

WAM World-Action Model: a policy that starts from a pretrained world-model or video backbone and adapts it to represent or predict how the scene changes over time and emit corresponding actions. We use WAM as the term throughout this post.

VLM Vision-Language Model: a model pretrained on image-text or video-text data to produce language outputs grounded in visual inputs, usually before being adapted for robot control.

developer.nvidia.com

Pretrained to Imagine, Fine-Tuned to Act: The Rise of World-Action Models | NVIDIA Technical Blog

Quick glossary for readers new to VLA/WAM terminology VLA Vision-Language-Action model: a robot policy that starts from a pretrained VLM backbone and adapts it to generate actions from visual…

lunedì 15 giugno 2026 New tab

TL;DRAI

11,537 words~52 min read

Jun 15, 2026

Quick glossary for readers new to VLA/WAM terminology

VLM Vision-Language Model: a model pretrained on image-text or video-text data to produce language outputs grounded in visual inputs, usually before being adapted for robot control.

Pretrained to Imagine, Fine-Tuned to Act: The Rise of World-Action Models | NVIDIA Technical Blog

Pretrained to Imagine, Fine-Tuned to Act: The Rise of World-Action Models | NVIDIA Technical Blog

Other newsrooms on this story

Related reading

World Action Models give robots the ability to simulate consequences before…

How to Post-Train Autonomous Vehicle Models in Closed-Loop with NVIDIA Alpamayo…

VLA or IL? A Controlled Dataset for Testing Whether Finetuning Turns Your VLA…

Machine Learning Posts

Meta’s new world model lets robots manipulate objects in environments they’ve…

AI's next big leap is models that understand the world.

Other newsrooms on this story

Related reading

World Action Models give robots the ability to simulate consequences before…

How to Post-Train Autonomous Vehicle Models in Closed-Loop with NVIDIA Alpamayo…

VLA or IL? A Controlled Dataset for Testing Whether Finetuning Turns Your VLA…

Machine Learning Posts

Meta’s new world model lets robots manipulate objects in environments they’ve…

AI's next big leap is models that understand the world.