Diffusion Language Models Are Here: Deep Dive into NVIDIA's Nemotron-Labs DLM Architecture

Meta Description: NVIDIA just open-sourced Nemotron-Labs Diffusion — a family of 3B, 8B, and 14B...

domenica 24 maggio 2026 New tab

3,563 words~16 min read

Meta Description: NVIDIA just open-sourced Nemotron-Labs Diffusion — a family of 3B, 8B, and 14B diffusion language models that merge autoregressive and diffusion generation for up to 6.4× faster inference. Here's the complete technical deep dive into the architecture, training methodology, three generation modes, and how to run it today with SGLang.

Table of Contents

The Speed Wall Autoregressive LLMs Hit

What Are Diffusion Language Models?

Why DLMs Struggled — Until Now

Diffusion Language Models Are Here: Deep Dive into NVIDIA's Nemotron-Labs DLM Architecture

Diffusion Language Models Are Here: Deep Dive into NVIDIA's Nemotron-Labs DLM Architecture

Other newsrooms on this story

Related reading

Diffusion Language Models: How NVIDIA Nemotron-Labs Diffusion Shatters the…

NVIDIA's Nemotron Diffusion: One Model, Three Generation Modes, 6 Faster

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language…

NVIDIA AI Releases Nemotron-Labs-Diffusion: A Tri-Mode Language Model with 6×…

NVIDIA Releases Nemotron-Labs-TwoTower: an Open-Weight Diffusion Language Model…

Consistency diffusion language models: Up to 14x faster inference without…

Other newsrooms on this story

Related reading

Diffusion Language Models: How NVIDIA Nemotron-Labs Diffusion Shatters the…

NVIDIA's Nemotron Diffusion: One Model, Three Generation Modes, 6 Faster

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language…

NVIDIA AI Releases Nemotron-Labs-Diffusion: A Tri-Mode Language Model with 6×…

NVIDIA Releases Nemotron-Labs-TwoTower: an Open-Weight Diffusion Language Model…

Consistency diffusion language models: Up to 14x faster inference without…