Playing peekaboo isn’t just a game for babies. It’s also one of the first ways they learn that something is still there, even if they can’t see it. When you hide your face and suddenly reappear—peekaboo!—the delightful surprise leads to giggles and learning.
Developing that skill isn’t about memorization. Instead, through observation, babies begin building a simple internal model of how the world works. By about a year old, they can tell that a ball that rolls behind a couch hasn’t vanished—it still exists, even out of sight.
Today’s AI systems—for all their conversational and pattern-matching prowess—can’t do this reliably. They can describe what’s in front of them, but they struggle with concepts like what’s hidden, or what will happen next in a sequence of actions.
The solution, many of the field’s top researchers believe, lies in so-called world models: AI systems designed not just to recognize patterns in text or images, but also to simulate how the physical world behaves. By training on millions of hours of video, these models can build an accurate internal picture of how the world works, physics and all—a crucial capability for a wide range of technologies, whether it’s to help a self-driving car predict what happens if a child runs into the street; help a home robot learn how to fold clothes; or simulate surgical procedures before a single incision is made.










