TL;DRByteDance announced Seedance 2.5 at its Beijing conference, generating 30-second native 4K video from up to 50 multimodal reference inputs.

ByteDance unveiled Seedance 2.5 on Tuesday at its Volcano Engine FORCE conference in Beijing, a video generation model that produces 30-second clips at native 4K resolution from a single prompt. The company skipped four intermediate versions entirely, jumping straight from its predecessor to signal what it described as a generational leap.

An enterprise beta is already live, with public launch targeted for early July. CEO Liang Rubo told the conference that climbing the AI summit is the company’s top priority, with its model-as-a-service business evolving into a foundational operation backed by long-term investment.

The headline upgrade is reference capacity: the model accepts up to 50 multimodal inputs, including images, audio clips, 3D white models, and style references, up from 12 in its predecessor. Those inputs give Seedance 2.5 far more granular control over style, motion, and composition than a text prompt alone.

The model generates at 4K natively rather than upscaling from a lower resolution, a distinction that matters for professional production pipelines. It supports 10-bit colour depth for smoother gradients and more room for post-production colour grading. ByteDance also claims 20 percent better prompt adherence, meaning fewer generations before a usable result.