AI agents pass tests while producing sloppy thinking. They say "should work" without evidence. They present partial work as complete. They embellish.

I built a tiny tool that catches this. It checks any text across four dimensions: Completeness, Consistency, Groundedness, and Honesty.

Install

pip install self-audit

Usage