Let me vent for a second, because last week broke me a little.
I've been doing SRE / database ops for about 5 years, and I keep relearning the same painful lesson the hard way: having backups and being able to recover are two completely different things.
Here's what happened.
A developer ran a SQL statement against production and accidentally deleted a single row of live data. One row. Sounds trivial to fix, right? Just put it back. But there was a hard constraint: I couldn't touch or overwrite any of the other live data while doing it. No "just restore the table and hope nothing else changed."
So here's what "having backups" actually looked like in practice:







