Measuring AI Gateway Failover: 30 Days of Production Data

TL;DR: We measured failover latency across three AI gateways (Bifrost, LiteLLM, Portkey) during 30 days of production traffic at Nexus Labs. Bifrost added 11ms p99 overhead with automatic provider fallback. The model is the easy part. Routing it reliably is not.

Our agent platform at Nexus Labs handles around 2.4M LLM requests per day. Half of those hit OpenAI, the rest spread across Anthropic, Bedrock, and Vertex. When OpenAI had its 4-hour incident on April 23, we lost 38 minutes of traffic before our homegrown retry logic gave up and rerouted.

That hurt. So we replaced the retry layer.

The actual problem

Most gateway benchmarks measure throughput on a cold path with no failures. That tells you very little about production. What I care about: how long does it take for a request to recover when a provider returns 429 or 503? How much p99 latency does the gateway add when nothing is wrong?

That hurt. So we replaced the retry layer.

The actual problem

Measuring AI Gateway Failover: 30 Days of Production Data

Other newsrooms on this story

Measuring AI Gateway Failover: 30 Days of Production Data

Other newsrooms on this story

Related reading

LiteLLM vs Bifrost: I Tested Both in Production. Here's What Actually Matters.

Stop Measuring Agent Infrastructure by Gateway Latency Alone

Running a High-Performance AI Gateway on Kubernetes

Benchmarking AI Gateways: GoModel vs LiteLLM vs Portkey vs Bifrost

AI Gateway production index - Vercel

The $2,300 Weekend: When Fallback Routing Goes Wrong in AI Gateways

Related reading

LiteLLM vs Bifrost: I Tested Both in Production. Here's What Actually Matters.

Stop Measuring Agent Infrastructure by Gateway Latency Alone

Running a High-Performance AI Gateway on Kubernetes

Benchmarking AI Gateways: GoModel vs LiteLLM vs Portkey vs Bifrost

AI Gateway production index - Vercel

The $2,300 Weekend: When Fallback Routing Goes Wrong in AI Gateways