We’ve been playing with Alibaba’s WAN2.1 text-to-video model lately. Like most image and video generation models, Wan has a lot of input parameters, and each of them can have a profound impact on the quality of the generated output.

What happens when you tweak those mysterious inputs? Let’s find out.

The experiment

We wanted to see how the guidance scale and shift input parameters affect the output. For our experiment, we used the WAN2.1 14b text-to-video model with 720p resolution.

To do this, we did what’s called a “parameter sweep”, systematically testing different combinations of input values to understand how they affect the output. We generated videos for each combination of guidance scale and shift values, keeping all other parameters constant.