The NVIDIA GB200 NVL72 and NVIDIA GB300 NVL72 systems, featuring NVIDIA Blackwell architecture, are rack-scale supercomputers. They’re designed with 18 tightly coupled compute trays, massive GPU fabrics, and high-bandwidth networking packaged as a unit.

For AI architects and HPC platform operators, the challenge isn’t just racking and stacking hardware—it’s turning infrastructure into safe, performant, and easy-to-use resources for end users. The mismatch between rack-scale hardware topology and scheduler abstractions is where most of the operational complexity lives. Left unaddressed, schedulers operate on a flat pool of GPUs and nodes, overlooking the system’s hierarchical and topology-sensitive design.

This is the gap that a validated software stack, such as NVIDIA Mission Control, is designed to bridge. Mission Control provides ‌rack-scale control planes for NVIDIA Grace Blackwell NVL72 systems. With a native understanding of NVIDIA NVLink and NVIDIA IMEX domains, it integrates with workload management platforms like Slurm and NVIDIA Run:ai. These capabilities will also be supported for the NVIDIA Vera Rubin platform, including for NVIDIA Rubin NVL8.

This post demonstrates how Mission Control, Slurm, and NVIDIA Run:ai turn advanced GPU architecture concepts—such as NVLink and IMEX domains—into an operational AI factory that is scalable, schedulable, and easy to manage.