I Stopped Paying for Idle GPUs - Scale-to-Zero AI Inference on OKE with KEDA

A single A10 GPU on OCI costs $1.52/hr. Running 24/7, that's $1,094/month. For a production inference service with steady traffic, that's fine. But I had a staging environment and a couple of internal tools that got maybe 20 requests per day. I was paying over $2,000/month for GPUs that sat idle 95% of the time.

The obvious solution: scale to zero when there's no traffic, spin up when a request comes in. KEDA does this on Kubernetes, but getting it to work properly with GPU pods took some figuring out.

Why Scaling GPUs Is Harder Than Scaling CPU Pods

With normal HTTP services, KEDA watches a metric (HTTP requests, queue depth, whatever), and Kubernetes can spin up a new pod in seconds. The user barely notices.

GPU pods are different:

The obvious solution: scale to zero when there's no traffic, spin up when a request comes in. KEDA does this on Kubernetes, but getting it to work properly with GPU pods took some figuring out.

Why Scaling GPUs Is Harder Than Scaling CPU Pods

With normal HTTP services, KEDA watches a metric (HTTP requests, queue depth, whatever), and Kubernetes can spin up a new pod in seconds. The user barely notices.

GPU pods are different:

I Stopped Paying for Idle GPUs - Scale-to-Zero AI Inference on OKE with KEDA

I Stopped Paying for Idle GPUs - Scale-to-Zero AI Inference on OKE with KEDA

Other newsrooms on this story

Related reading

The 'Own Hardware for AI' Myth

Quick Tip: Cut Your AI Inference Costs by 80% in Under 10 Minutes

Deploying vLLM on OKE with NVIDIA A10 GPUs: The 20-Minute Setup Nobody Talks…

The $2 trillion AI infrastructure problem no one is talking about, and the…

FOMO Driving GPU Overbuying, 95% of Capacity Idle

I Tested 9 Serverless GPU Providers for AI Inference in 2026. Here's What I'd…

Other newsrooms on this story

Related reading

The 'Own Hardware for AI' Myth

Quick Tip: Cut Your AI Inference Costs by 80% in Under 10 Minutes

Deploying vLLM on OKE with NVIDIA A10 GPUs: The 20-Minute Setup Nobody Talks…

The $2 trillion AI infrastructure problem no one is talking about, and the…

FOMO Driving GPU Overbuying, 95% of Capacity Idle

I Tested 9 Serverless GPU Providers for AI Inference in 2026. Here's What I'd…