Someone on our team was cleaning up feature flags. It was good instinct...we had a pile of them, and plenty were clearly dead. One in particular had been turned off for over a year. It looked about as safe to delete as anything could look. So they deleted it.

And it turned a feature back on.

Here's the part that still makes me put palm to forehead. Deleting the flag didn't delete the feature that depended on it. There was still code out there, half-forgotten, that checked that flag before doing its thing. While the flag existed and was off, that code stayed quiet. When the flag got deleted, the check didn't blow up...it fell back to the code's default value. And the default was true.

So a feature nobody had thought about in a year quietly switched itself on. "Cleanup" turned out to be a deploy.

On its own, that feature wasn't a big deal. The problem was it collided with another, more important feature. The two were never meant to be running at the same time, and when they were, the second one broke too. Now we had a real bug in production. And here's where I'll own a mistake...our observability wasn't as good as it should have been. We didn't catch it. We found out when a user told us.