Building a "Git for Video" using Next.js and Google's Gemini Omni Model

If you’ve played around with current AI video generators, you already know the frustration: It’s basically a slot machine.

You write a massive prompt, hit "generate," wait 3 minutes, and pray. If the lighting is wrong, or the character's jacket changed color? You have to rewrite the prompt and re-roll the dice. You lose all your previous progress.

As a developer, this lack of "state" drove me crazy. Why can't we have version control or iterative diffs for video generation? Why can't I just tell the AI, "Keep everything exactly the same, but make it rain in the background"?

I decided to fix this by ditching the traditional NLE (Non-Linear Editor) timeline entirely and building a conversational video generator powered by Google's Gemini Omni model.

Here is how I built it, the technical hurdles of maintaining "video state," and why I think conversational UI is the future of video editing.

If you’ve played around with current AI video generators, you already know the frustration: It’s basically a slot machine.

I decided to fix this by ditching the traditional NLE (Non-Linear Editor) timeline entirely and building a conversational video generator powered by Google's Gemini Omni model.

Here is how I built it, the technical hurdles of maintaining "video state," and why I think conversational UI is the future of video editing.

Building a "Git for Video" using Next.js and Google's Gemini Omni Model

Other newsrooms on this story

Building a "Git for Video" using Next.js and Google's Gemini Omni Model

Other newsrooms on this story

Related reading

How Video-Native AI Actually Works — The Architecture Behind Gemini Omni

Google unveils Gemini Omni, a multimodal AI model that generates video from…

Google targets AI agents and video generation with Gemini 3.5 Flash and Omni -…

Google Unveils Gemini Omni—A Next-Gen AI Video Builder That Can 'Simulate the…

Gemini Omni's Conversational Video Editing Is a Paradigm Shift — And Nobody's…

Google unveils Gemini Omni 'any-to-any' AI model: what enterprises should know

Related reading

How Video-Native AI Actually Works — The Architecture Behind Gemini Omni

Google unveils Gemini Omni, a multimodal AI model that generates video from…

Google targets AI agents and video generation with Gemini 3.5 Flash and Omni -…

Google Unveils Gemini Omni—A Next-Gen AI Video Builder That Can 'Simulate the…

Gemini Omni's Conversational Video Editing Is a Paradigm Shift — And Nobody's…

Google unveils Gemini Omni 'any-to-any' AI model: what enterprises should know