NVIDIA delivered a clean sweep in MLPerf Training v6.0, the latest edition of industry-standard AI training benchmarks developed by the MLCommons consortium. NVIDIA achieved the fastest time to train at scale, and also delivered the highest performance when normalized on a per-accelerator basis on every benchmark. It was also the only platform to submit on every test.

MLCommons introduced new pretraining benchmarks in this round designed to reflect the latest trends in AI models, including DeepSeek-V3, a massive 671B-parameter Mixture of Experts (MoE) model that also serves as the base for the popular DeepSeek-R1 reasoning model, and GPT-OSS-20B, a small-but-capable MoE.

The NVIDIA platform was the only one to submit results on both new workloads, with the NVIDIA GB300 NVL72 system setting the performance bar through optimized NVIDIA software stacks and a design that connects 72 NVIDIA Blackwell Ultra GPUs and 36 NVIDIA Grace CPUs as one using NVIDIA NVLink and NVIDIA NVLink Switch.

Unprecedented scale and throughput across the scale-out fabric

Training state-of-the-art models requires large-scale infrastructure and the ability to efficiently execute workloads across thousands of interconnected processors. In several entries this round, NVIDIA cloud service provider partners scaled up to 8,192 Blackwell GPUs working in unison across diverse cloud data centers. These submissions proved the real-world robustness of the Blackwell platform across production hyperscale data center fleets, demonstrating strong scaling trends across these varied cluster environments.