The site was "up." The monitor said so. HTTP 200, response times normal, no alerts.

What the monitor didn't know - what I didn't know - was that our SSL certificate had expired 87 minutes earlier and every user hitting the site was getting a certificate error in their browser. Not a down page. Not a 5xx. A cert error. The kind where browsers show a big red warning screen and most users immediately close the tab.

For a checkout flow, that's about as bad as the server being down. Worse, actually, because at least a down server triggers your uptime alert.

This is the post-mortem.

What happened