A classifier confidence of 0.99 is enough to decide a tier. It is not enough to send an email you can't unsend.

Those are two different bars, and most "autonomous" systems use the first one to clear the second. That's the bug.

This is the third post in a series that started as a cheap-model brag and turned into an architecture argument. Post one: a cheap model beat GPT-4o on email triage. Post two: the model only scores four features, and a deterministic rule picks the tier. A commenter, @hannune, pointed at one of those four features:

Your reversibility signal is something I have not seen named explicitly before but it is exactly the right axis for anything that touches irreversible state.

He's right, and it's the cleanest way into the last piece of the design. So: what reversibility actually routes.