Google unveils Gemini Omni, a multimodal AI model that generates video from text, images, and audio

Google DeepMind just dropped what might be the most capable video generation model yet. Gemini Omni, unveiled at Google I/O on May 19-20, 2026, accepts text, images, audio, and video as inputs and spits out short video clips, roughly 10 seconds long, complete with synchronized audio.

The model’s first variant, Gemini Omni Flash, is the tip of the spear. It replaces Google’s earlier Veo model inside the Gemini app, marking a shift from standalone video generation toward what Google is calling “anything from anything” creation.

What Gemini Omni actually does

Early demonstrations showed effective text rendering within video, along with advanced scene editing capabilities.

Google is emphasizing improvements in world understanding, physics simulation, and character consistency. The company drew comparisons to its Nano Banana image model, which earned praise for visual fidelity. Gemini Omni extends that same logic into motion and sound, wrapping everything into a conversational interface where users can iteratively edit and refine their clips through dialogue.

What Gemini Omni actually does

Early demonstrations showed effective text rendering within video, along with advanced scene editing capabilities.

Google unveils Gemini Omni, a multimodal AI model that generates video from text, images, and audio

Google unveils Gemini Omni, a multimodal AI model that generates video from text, images, and audio

Other newsrooms on this story

Related reading

Google's Gemini Omni turns images, audio, and text into video — and that's just…

Google Unveils Gemini Omni—A Next-Gen AI Video Builder That Can 'Simulate the…

Google launches Gemini Omni Flash, a conversational video-generation model with…

Google unveils Gemini Omni, its first native multimodal AI model built for…

Google unveils Gemini Omni 'any-to-any' AI model: what enterprises should know

Google takes next big step towards AGI, launches Gemini Omni: What is it, how…

Other newsrooms on this story

Related reading

Google's Gemini Omni turns images, audio, and text into video — and that's just…

Google Unveils Gemini Omni—A Next-Gen AI Video Builder That Can 'Simulate the…

Google launches Gemini Omni Flash, a conversational video-generation model with…

Google unveils Gemini Omni, its first native multimodal AI model built for…

Google unveils Gemini Omni 'any-to-any' AI model: what enterprises should know

Google takes next big step towards AGI, launches Gemini Omni: What is it, how…