Unlock Exascale Performance on NVIDIA GB200 NVL72 with Slurm Topology-Aware Job Scheduling | NVIDIA Technical Blog

As AI models grow in scale and complexity, realizing the full performance of modern accelerated infrastructure depends as much on how workloads are placed as on the hardware itself. NVIDIA GB200 NVL72 delivers exascale compute in a single rack, unlocking real-time trillion-parameter models. Yet capturing that performance in a shared cluster requires schedulers that understand the system architecture and align jobs with its network topology.

This post explains how Slurm topology-aware job scheduling works on NVIDIA GB200 NVL72, and provides scheduling recommendations for optimal GPU occupancy.

How does NVIDIA GB200 NVL72 deliver exascale compute?

NVIDIA GB200 NVL72 is an exascale computer in a single rack. With 72 NVIDIA Blackwell GPUs interconnected by the largest production scale-up compute fabric, NVIDIA NVLink provides 130 terabytes per second (TB/s) of low-latency GPU communication bandwidth for AI and high-performance computing (HPC) workloads. Multiple GB200 NVL72 systems combined in a cluster create hierarchical network topology with large domains of very high networking bandwidth.

An AI training job can greatly benefit from the abundant networking bandwidth offered by GB200 NVL72, when scheduled to maximize the use of NVLink fabrics. Recent results show that GB200 NVL72 delivers significant improvement in performance for all AI workloads, including training (>2.6x with recent MLPerf training), across different inference use cases (real-time inference for trillion-parameter models, >1.5 million tokens/second for the OAI gpt-oss model, state-of-art disaggregate serving), as well as reasoning.

This post explains how Slurm topology-aware job scheduling works on NVIDIA GB200 NVL72, and provides scheduling recommendations for optimal GPU occupancy.

How does NVIDIA GB200 NVL72 deliver exascale compute?

Unlock Exascale Performance on NVIDIA GB200 NVL72 with Slurm Topology-Aware Job Scheduling | NVIDIA Technical Blog

Other newsrooms on this story

Unlock Exascale Performance on NVIDIA GB200 NVL72 with Slurm Topology-Aware Job Scheduling | NVIDIA Technical Blog

Other newsrooms on this story

Related reading

Achieving Peak System and Workload Efficiency on NVIDIA GB200 NVL72 with Slurm…

Running AI Workloads on Rack-Scale Supercomputers: From Hardware to…

Running Large-Scale GPU Workloads on Kubernetes with Slurm | NVIDIA Technical…

Category: Networking / Communications | NVIDIA Technical Blog

NVIDIA Technical Blog

Maximizing GPU Utilization with NVIDIA Run:ai and NVIDIA NIM | NVIDIA Technical…

Related reading

Achieving Peak System and Workload Efficiency on NVIDIA GB200 NVL72 with Slurm…

Running AI Workloads on Rack-Scale Supercomputers: From Hardware to…

Running Large-Scale GPU Workloads on Kubernetes with Slurm | NVIDIA Technical…

Category: Networking / Communications | NVIDIA Technical Blog

NVIDIA Technical Blog

Maximizing GPU Utilization with NVIDIA Run:ai and NVIDIA NIM | NVIDIA Technical…