NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes | NVIDIA Technical Blog

In production inference deployments, demand fluctuates over time, requiring inference replicas to scale elastically. However, cold-starting inference workloads on Kubernetes can take several minutes.

giovedì 28 maggio 2026 New tab

2,627 words~12 min read

The cold-start problem

In production inference deployments, demand fluctuates over time, requiring inference replicas to scale elastically. However, cold-starting inference workloads on Kubernetes can take several minutes. During that time, GPUs are allocated but idle, generating no tokens and serving no requests.

This delay increases the risk of service level agreement (SLA) violations during traffic spikes, as the system cannot scale quickly enough to absorb sudden increases in demand.

For a single-GPU vLLM (v0.20.0) workload, the cold-start latency breaks down as follows:

Figure 1. Cold-Start Latency Breakdown for a Single-GPU Inference Worker

NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes | NVIDIA Technical Blog

NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes | NVIDIA Technical Blog

Other newsrooms on this story

Related reading

NVIDIA AI Releases Dynamo Snapshot: A CRIU-Based Fast Startup System for AI…

GPU autoscaling on Kubernetes with KEDA: building an external scaler with NVML

NVIDIA Technical Blog

A Guide to AI Cold Starts on Cloud Run

Get Real-Time Visibility into GPU Usage Across Kubernetes Clusters | NVIDIA…

Deploying Disaggregated LLM Inference Workloads on Kubernetes | NVIDIA…

Related reading

NVIDIA AI Releases Dynamo Snapshot: A CRIU-Based Fast Startup System for AI…

GPU autoscaling on Kubernetes with KEDA: building an external scaler with NVML

NVIDIA Technical Blog

A Guide to AI Cold Starts on Cloud Run

Get Real-Time Visibility into GPU Usage Across Kubernetes Clusters | NVIDIA…

Deploying Disaggregated LLM Inference Workloads on Kubernetes | NVIDIA…

Other newsrooms on this story