Originally published at devopsdiary.blog. Post F2 in the "Governing AI in the Enterprise" series.
DORA worked because shipping software was a pretty stable thing to measure. You changed code, you deployed it, you watched whether prod fell over. The four metrics held up for a decade because the underlying activity didn't change much underneath them.
Then Copilot showed up. And Cursor. And whatever your team is piloting this quarter that nobody told platform engineering about.
The activity changed. The metrics didn't. That's the gap.
I keep landing in the same place on this. You need two layers. Most teams have neither. One is an evaluation layer that watches the AI itself. The other is a governance layer that decides what the evaluation results mean. Skip either one and you end up with dashboards that look healthy while the work underneath them quietly drifts.








