How do AI models generate videos?

With powerful video generation tools now in the hands of more people than ever, let's take a look at how they work.

MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

It’s been a big year for video generation. In the last nine months OpenAI made Sora public, Google DeepMind launched Veo 3, the video startup Runway launched Gen-4. All can produce video clips that are (almost) impossible to distinguish from actual filmed footage or CGI animation. This year also saw Netflix debut an AI visual effect in its show The Eternaut, the first time video generation has been used to make mass-market TV.

New diffusion AI models that make songs from scratch are complicating our definitions of authorship and human creativity.

Sure, the clips you see in demo reels are cherry-picked to showcase a company’s models at the top of their game. But with the technology in the hands of more users than ever before—Sora and Veo 3 are available in the ChatGPT and Gemini apps for paying subscribers—even the most casual filmmaker can now knock out something remarkable.

How do AI models generate videos?

Other newsrooms on this story

Related reading

Creators say they didn't know Google uses YouTube to train AI

Video che sembrano reali e brani creati solo con un prompt: così l'intelligenza…

Two AI video iPhone apps are going viral after rising from the ashes of…

TikTok took the world by storm. Now, Chinese companies are taking videos…

AI slop: Is the internet about to get even worse? – podcast

Google ha rotto il silenzio dell’intelligenza artificiale