TL;DRAI

swarm audit rileva scorciatoie in PR scritti da AI con 85% accuracy (test cancellati, empty catch, mock falsi); su due Cloudflare PR catturato rename bug che Semgrep/ESLint missano. Il linting non cattura 'missing code'; swarm audit enforza contract testing + compliance doc (EU AI Act, CISA).

AI agents open a lot of pull requests now. Most are fine. Some quietly cheat to make the checks go green: they delete the failing test, weaken an assertion, wrap the broken call in an empty catch so the error disappears. The diff looks done. A reviewer skimming forty agent PRs a day will not catch that by eye.

swarm audit is a command-line tool that does. I maintain it. It runs three jobs on AI-written code, all offline, no API key.

1. Catch the cheat

Eleven checks read a pull-request diff and flag the shortcut patterns: a deleted test with no matching code change, a function renamed while its callers still use the old name, an error swallowed by an empty catch, a mock of a package that exists in no manifest, a type-checker suppression dropped over a changed line, and more.

The detection is measured, not asserted. Hide one known cheat in each of 300 real merged PRs, run the auditor, count the catches: 254, about 85%, reproducible with one command.

dev.to

AI wrote the PR. How do you know it actually works?

A command-line trust layer for AI-written code: catch the cheats, prove the change meets its spec, and produce the compliance paperwork. With the numbers.

mercoledì 3 giugno 2026 New tab

TL;DRAI

533 words~2 min read

swarm audit is a command-line tool that does. I maintain it. It runs three jobs on AI-written code, all offline, no API key.

1. Catch the cheat

The detection is measured, not asserted. Hide one known cheat in each of 300 real merged PRs, run the auditor, count the catches: 254, about 85%, reproducible with one command.

AI wrote the PR. How do you know it actually works?

AI wrote the PR. How do you know it actually works?

Related reading

AI Agents Cheat on Pull Requests. I Mined 327 of Them to Prove It.

Audit AI-Generated PRs Before You Merge Them (Swarm Orchestrator 10.3.0)

Your AI code reviewer ran my malware (85% hit rate)

How to prove what your AI agent actually did (to someone who doesn't trust you)

Audit AI-Generated Tests: Half of Green CI Proves Nothing

AI doesn't write bad code. It writes plausible code — so I tried to break my…

Related reading

AI Agents Cheat on Pull Requests. I Mined 327 of Them to Prove It.

Audit AI-Generated PRs Before You Merge Them (Swarm Orchestrator 10.3.0)

Your AI code reviewer ran my malware (85% hit rate)

How to prove what your AI agent actually did (to someone who doesn't trust you)

Audit AI-Generated Tests: Half of Green CI Proves Nothing

AI doesn't write bad code. It writes plausible code — so I tried to break my…