NVIDIA GB200 NVL72 introduces a fundamentally new way to build GPU clusters by extending NVIDIA NVLink coherence across an entire rack. This design enables exascale performance, but it also changes the assumptions that many scheduling systems were built on.

As a result, “rack-scale locality” becomes a hard constraint. When workloads cross domain boundaries, performance drops sharply, and a scheduler that treats the network fabric as a best-effort tree topology will fragment allocations in ways that increase queue times and degrade application performance.

To address this, Slurm workload manager introduced the topology/block plugin and continues expanding its capabilities with segmented scheduling. The plugin enables administrators and users to express application-specific NVLink requirements as atomic blocks rather than loosely optimized allocations.

This post explains how NVIDIA GB200 NVL72 architecture is unique, how Slurm block scheduling helps optimize placement and performance, and how to configure topology.yaml, --segment, and related features so you can move from prototype clusters to production-grade rack-scale orchestration.

How is NVIDIA GB200 NVL72 architecture unique?