Google launches Gemini Omni Flash, a conversational video-generation model with avatar mode held back

The first model in DeepMind’s new Omni family will generate and edit video from any combination of image, audio, video, and text inputs. Speech-editing is being withheld; SynthID watermarking is on by default.

Google introduced Gemini Omni on Tuesday at the I/O 2026 developer conference, a new multimodal model family from Google DeepMind designed to generate and edit video from any combination of image, audio, video, and text inputs.

The first model in the family, Gemini Omni Flash, started rolling out the same day to the Gemini app and Google Flow for Google AI Plus, Pro, and Ultra subscribers, and to YouTube Shorts and the YouTube Create app at no cost. API access for developers and enterprise customers will follow in the coming weeks.

The product framing, from Koray Kavukcuoglu, CTO of Google DeepMind and Chief AI Architect at Google, is that Omni ‘combines images, audio, video, and text as input and generates high-quality videos grounded in Gemini’s real-world knowledge.’ Inputs can be mixed in a single prompt.

Edits are made conversationally, with each instruction building on the previous one, so that characters, physics, and scene context persist across turns. Output modalities beyond video, including image and audio generation, are ‘coming in time,’ Kavukcuoglu wrote on the company’s blog.

Google launches Gemini Omni Flash, a conversational video-generation model with avatar mode held back

Google launches Gemini Omni Flash, a conversational video-generation model with avatar mode held back

Other newsrooms on this story

Related reading

Google unveils Gemini Omni, a multimodal AI model that generates video from…

Google's Gemini Omni turns images, audio, and text into video — and that's just…

Gemini Omni Flash claims top spot in Video Arena rankings

Google Unveils Gemini Omni—A Next-Gen AI Video Builder That Can 'Simulate the…

Gemini Omni Flash can create and edit videos with your voice and it feels like…

Gemini Omni's Conversational Video Editing Is a Paradigm Shift — And Nobody's…

Related reading

Google unveils Gemini Omni, a multimodal AI model that generates video from…

Google's Gemini Omni turns images, audio, and text into video — and that's just…

Gemini Omni Flash claims top spot in Video Arena rankings

Google Unveils Gemini Omni—A Next-Gen AI Video Builder That Can 'Simulate the…

Gemini Omni Flash can create and edit videos with your voice and it feels like…

Gemini Omni's Conversational Video Editing Is a Paradigm Shift — And Nobody's…

Other newsrooms on this story