Google unveiled Gemini Omni at I/O, its first native multimodal AI model for enterprises that processes video, audio, images, and text from a single architecture.

Google's new multimodal AI model powers updates to Flow and Flow Music, including conversational video editing and AI-generated media tools.

The model marks Google's bid to collapse the multimodal generative stack — text-to-image, image-to-video, video-to-video, audio generation — into a single foundation model with a…