Your dashboards go red. Not one model. All of them. Retries spike. Latency climbs. Nothing recovers.
When every Claude call starts failing at once
If you ship anything serious on top of LLM APIs, you have lived this moment.
Requests that worked minutes ago now return errors. You flip the model flag. Same thing. You increase retries. Error rate goes up, not down. Queues back up. Downstream services feel it.
This is not a prompt issue. This is not a bad deploy. This is what a platform-level incident looks like from the client side.






