Meta Description: NVIDIA just open-sourced Nemotron-Labs Diffusion — a family of 3B, 8B, and 14B diffusion language models that merge autoregressive and diffusion generation for up to 6.4× faster inference. Here's the complete technical deep dive into the architecture, training methodology, three generation modes, and how to run it today with SGLang.

Table of Contents

The Speed Wall Autoregressive LLMs Hit

What Are Diffusion Language Models?

Why DLMs Struggled — Until Now