The model marks Google's bid to collapse the multimodal generative stack — text-to-image, image-to-video, video-to-video, audio generation — into a single foundation model with a single editing surface.

Google's new multimodal AI model powers updates to Flow and Flow Music, including conversational video editing and AI-generated media tools.

Modelo Omni recebe comandos por texto, imagem, fala e vídeo, e tem foco inicial em geração de vídeos realistas