I ran 4 AI agents on yesterday's PRs. Two real security bugs surfaced.

After every coding session I run a 4-agent parallel audit on the diff I just shipped. A recent session of mine was seven PRs landing a new daily-challenge feature on my open-source LMS. Two of the audit findings were real security or integrity bugs that my human review missed. This is the playbook.

The four agents

I split the audit into four narrow roles. Narrow because a generalist agent tells you everything is fine; a specialist with a clear mandate tells you what is broken.

Cleanup agent. Looks for leftover patterns from the work just done: dead references to removed roles, unused i18n keys, orphan test fixtures, dual-named scripts where one should have died.

Security agent. Auth, tokens, rotation, CSP, RLS, secrets in code, error messages that leak structure. Treats every new endpoint as hostile until proven otherwise.

I ran 4 AI agents on yesterday's PRs. Two real security bugs surfaced.

Related reading

Sentry's Span Hierarchy Exposed a Silent Retry in My 5-Agent Pipeline. One…

Your AI code reviewer ran my malware (85% hit rate)

Lessons from a 109-agent code audit workflow

Four agents, 77 projects, 90 minutes: the multi-agent Claude Code pattern I run…

I Reviewed 200+ AI-Generated PRs. Here's the 4-Round Protocol I Use Now.

I Added a 4th Agent That Audits My Other Agents. It Caught My Strategist…