Fei-Fei Li wants to settle a debate that’s been simmering in the AI community for a while now: what actually counts as a “world model” and what’s just a fancy video generator wearing a lab coat.
The Stanford professor and World Labs CEO published “A Functional Taxonomy of World Models” on June 3, 2026, laying out a framework that categorizes world models into three distinct functions: renderer, simulator, and planner. The paper argues these three roles form an interconnected loop that underpins what Li calls “spatial intelligence,” the kind of AI that can actually understand and interact with physical environments.
Three jobs, one model
The renderer function handles visual generation. It creates high-fidelity visual representations from data inputs. This is what most current “world models” actually do, and Li makes the pointed argument that systems stuck at this level are not true world models at all.
The simulator function goes deeper. It doesn’t just show you what something looks like. It models physics, cause and effect, and the way objects interact over time. A renderer can show you a ball rolling toward a cliff edge. A simulator knows the ball will fall off.















