One of the strangest things about AI engineering is that your test suite can be 100% green while your product is getting worse.

Traditional software taught us to think in absolutes:

pass/fail

correct/incorrect

deterministic/non-deterministic