Stability AI has unveiled Stable Audio 3.0, a new generation of audio models - three of which ship with open weights. The models generate music tracks up to six minutes long and were trained entirely on licensed data, according to the company.
The model family includes four variants. Stable Audio 3.0 Small SFX and Stable Audio 3.0 Small each pack 459 million parameters and produce tracks up to two minutes long in 0.44 seconds of inference time on an H200 GPU. The first focuses on sound effects and is designed for smartphones and consumer laptops. The second targets short music pieces. Stable Audio 3.0 Medium runs 1.4 billion parameters and generates tracks up to 6:20 minutes in 1.31 seconds. All three are available as open-weights models on Hugging Face.
The largest model, Stable Audio 3.0 Large with 2.7 billion parameters, isn't available as open weights. It's only accessible through the Stability AI API, through partner fal.ai, or can be hosted on a company's own infrastructure via enterprise licensing. Stability AI says it delivers the highest musicality and is built for music platforms with high generation volume.
New architecture enables longer, more flexible audio output
Stable Audio 3.0 runs on a new architecture with a semantic-acoustic autoencoder that allows longer and more flexible audio output, according to Stability AI. Generation works at variable length with second-level control.












