Why cloud outages are such a stubborn problem

For years, the cloud market has made a simple promise: Move workloads to large-scale platforms, gain better resilience, and worry less about downtime. That promise was never entirely wrong, but it is becoming less complete. The latest findings from Uptime Institute’s seventh Annual Outage Analysis suggest that the outage landscape is changing in ways that should concern both cloud providers and cloud customers. The biggest risks are no longer limited to broken physical infrastructure. They are increasingly tied to the complexity of the systems used to run, coordinate, update, and recover that infrastructure.

The most alarming number in the report is that IT and networking issues accounted for 23% of impactful outages in 2024. Uptime Institute links these increases to growing IT and network complexity; the long-term shift toward colocation, cloud, and third-party digital services; and the resulting increase in change-management failures and misconfigurations. That number is more than a statistical footnote. It points to a structural change in how outages happen and why cloud outages are becoming such a stubborn problem.

Hardware redundancy can protect against component failures, but it doesn’t help much when the outage stems from a bad configuration, an automation error, a faulty network change, or an underappreciated control-plane dependency. In those cases, the infrastructure itself may remain intact while the system that governs it breaks down. The industry is learning that resiliency is less about duplicating equipment and more about managing complexity. Today’s increasingly distributed and software-defined environments cannot operate safely at scale.

Why cloud outages are such a stubborn problem

Other newsrooms on this story

Related reading

The causes of cloud outages are changing

Don’t waste your next cloud outage

Data centers are getting caught up in conflict. What does this mean for cloud…

Google Cloud outage shows it’s still hard to understand hyperscalers’ real…

Designing for disaster: why one data center is never enough

The Microsoft Azure Outage Shows the Harsh Reality of Cloud Failures