AI agents pass tests while producing sloppy thinking. They say "should work" without evidence. They present partial work as complete. They embellish.
I built a tiny tool that catches this. It checks any text across four dimensions: Completeness, Consistency, Groundedness, and Honesty.
Install
pip install self-audit
Usage






