The pager goes off at 02:14. Checkout latency is up, error rate is climbing, and you have three dashboards, a wall of logs, and a half-awake brain. The fix, once you know what's wrong, is usually fast. The expensive part is the triage — the first fifteen minutes of "what is actually broken, and what changed?"
That triage window is exactly where AI helps most, and exactly where it's most dangerous if you let it run commands. This is how to use it to go faster without handing it the keys to production.
The rule that makes AI safe during an incident
AI reads and reasons. Humans run commands.
During an active incident you are sleep-deprived and time-pressured — the worst possible state to paste a command you don't fully understand. So draw a hard line: AI is allowed to look at evidence and propose a plan. It is never allowed to execute anything. Every command it suggests goes through your eyes and your hands.










