The model marks Google's bid to collapse the multimodal generative stack — text-to-image, image-to-video, video-to-video, audio generation — into a single foundation model with a single editing surface.

Introducing Gemini Omni, which allows you to create anything from any input and edit naturally using conversational language.

Google's new multimodal AI model powers updates to Flow and Flow Music, including conversational video editing and AI-generated media tools.