What a policy gate catches in AI-generated code, and what slips through

I maintain an open-source GitHub Action called vorsken. It does one thing: scan the diff on a pull request with Semgrep, apply a fixed policy, and return BLOCK, FLAG, or PASS. No dashboard, no model that drifts over time. Rules at ERROR/HIGH/CRITICAL severity block the merge, WARNING/MEDIUM flag it, the rest pass. Same diff, same verdict.

The usual pitch for a tool like this is that it catches the SQL injection your AI assistant wrote. I wanted to see what it actually catches against real assistant output, so I generated 28 functions and ran them through.

The test

Seven backend tasks: a FastAPI upload endpoint, a URL-fetch helper, JWT auth, a SQL filter, an ImageMagick subprocess call, a LangChain file agent, and a LangChain RAG pipeline. I generated each one four times, with ChatGPT (GPT-5.5 Instant), Claude Code (Opus 4.8), Claude Code plus the security-guidance plugin, and Cursor (Composer 2.5). Single-shot, neutral prompt, no security hints. Then I scanned all 28 with the same ruleset.

I'm reporting which rule fired on which file, not whether some model thinks the code is safe. That part you can reproduce.

What a policy gate catches in AI-generated code, and what slips through

Related reading

Audit AI-Generated PRs Before You Merge Them (Swarm Orchestrator 10.3.0)

Catching the shortcuts AI coding agents take to look done

AI wrote the PR. How do you know it actually works?

I Built a GitHub Action That Reviews Pull Requests Using Two AI Models

I built an AI code reviewer as a GitHub Action — here's what I learned

The 10 Svelte 5 & SvelteKit footguns your AI review bot waves through — and how…