I’ve been running field tests against live open-source issues to study a specific question:
When AI coding agents make repairs, where do they lose the actual truth of the repo?
Not “does the AI write code?”
Not “can it pass a test?”
More specifically:






