You configured your app to use gpt-4o. Your provider returned a response from gpt-4o-mini. Same HTTP 200. Same JSON structure. But 10x the error rate and half the quality.

This isn't a hypothetical. It's happening every day in production AI systems.

The Scale of the Problem

When a provider changes the model serving your request without notice, it's called a silent model swap. And it's remarkably common:

Provider-side upgrades: "We've upgraded you to a faster model" — without telling you