Google unveiled Gemini Omni at I/O, its first native multimodal AI model for enterprises that processes video, audio, images, and text from a single architecture.

Introducing Gemini Omni, which allows you to create anything from any input and edit naturally using conversational language.

Google's new multimodal AI model powers updates to Flow and Flow Music, including conversational video editing and AI-generated media tools.