Today's robotics AI has a basic weakness: models learn to map camera images directly to movements. But they don't understand how the world actually changes as a result of their actions.
A new survey paper from Fudan University, the Shanghai Innovation Institute, and the National University of Singapore is the first to systematically catalog a class of models designed to close that gap: World Action Models.
The authors map all current World Action Models along two main branches, showing how joint and cascaded architectures have branched out since 2024. | Image: Wang et al.
Robots that simulate their own near future
Existing vision-language-action models mostly learn direct mappings from observations to matching actions. World Action Models go further. They also model how the environment will likely change, then couple that prediction to action generation.











