Google has unveiled Gemini Omni, a new family of generative models designed to “create anything,” and you can use it today to create surprisingly realistic videos.

Something Google has been working on in recent years is a “world model” that can maintain a cohesive, grounded world. The company explored the idea through its Genie model, which generates interactive video-game-esque experiences based on user prompts. Google has also long offered the Veo and Nano Banana models that bring capable video and image creation/editing via text and image inputs.

As part of I/O 2026, Google revealed Gemini Omni, a new model which leverages a similar level of multimodal understanding grounded in reality. While Omni is currently only designed to generate video content, it is presented as being designed to “create anything from any input.” This means bringing together text, images, video, and audio (initially limited to speech samples) to create a unified output video. After generation, you can further refine your video in subsequent turns.

Google’s initial demos for Omni are quite impressive, showing how Gemini understands each of the elements in the final video. The rolling marble video is a great example, with believable physics for the ball and convincing sound effects for each bounce and the bell ring.