Running Large-Scale GPU Workloads on Kubernetes with Slurm | NVIDIA Technical Blog

Slurm is an open source cluster management and job scheduling system for Linux. It manages job scheduling for over 65% of TOP500 systems. Most organizations running large-scale AI training have years of investment in Slurm job scripts, fair-share policies, and accounting workflows. The challenge is getting Slurm scheduling capabilities onto Kubernetes—the standard platform for managing GPU infrastructure at scale—without maintaining two separate environments.

Slinky, an open source project developed by SchedMD (now part of NVIDIA), takes two approaches to this integration:

slurm-bridge brings Slurm scheduling to native Kubernetes workloads, allowing Slurm to act as a Kubernetes scheduler for pods

slurm-operator runs full Slurm clusters on Kubernetes infrastructure, managing the complete lifecycle of Slurm daemons as pods

This post focuses on the slurm-operator, which is how NVIDIA runs Slurm on Kubernetes for large-scale GPU training clusters. It walks through the architecture of the operator and how it maps Slurm daemons to Kubernetes primitives, then covers deployment—including how Slinky slurm-operator integrates with your existing infrastructure. It also covers the Kubernetes ecosystem integrations that make this model practical. Finally, we share lessons from running Slinky in production at NVIDIA on clusters with over 1,000 GPU worker nodes and 8,000+ GPUs.

Slinky, an open source project developed by SchedMD (now part of NVIDIA), takes two approaches to this integration:

slurm-bridge brings Slurm scheduling to native Kubernetes workloads, allowing Slurm to act as a Kubernetes scheduler for pods

slurm-operator runs full Slurm clusters on Kubernetes infrastructure, managing the complete lifecycle of Slurm daemons as pods

Running Large-Scale GPU Workloads on Kubernetes with Slurm | NVIDIA Technical Blog

Running Large-Scale GPU Workloads on Kubernetes with Slurm | NVIDIA Technical Blog

Related reading

Category: Networking / Communications | NVIDIA Technical Blog

Unlock Exascale Performance on NVIDIA GB200 NVL72 with Slurm Topology-Aware Job…

AI Workloads Are Reshaping Kubernetes in 2026: GPU Scheduling, MLOps, and the…

Achieving Peak System and Workload Efficiency on NVIDIA GB200 NVL72 with Slurm…

Get Real-Time Visibility into GPU Usage Across Kubernetes Clusters | NVIDIA…

Deploying Disaggregated LLM Inference Workloads on Kubernetes | NVIDIA…

Related reading

Category: Networking / Communications | NVIDIA Technical Blog

Unlock Exascale Performance on NVIDIA GB200 NVL72 with Slurm Topology-Aware Job…

AI Workloads Are Reshaping Kubernetes in 2026: GPU Scheduling, MLOps, and the…

Achieving Peak System and Workload Efficiency on NVIDIA GB200 NVL72 with Slurm…

Get Real-Time Visibility into GPU Usage Across Kubernetes Clusters | NVIDIA…

Deploying Disaggregated LLM Inference Workloads on Kubernetes | NVIDIA…