I tried to break my own MCP prompt-injection detector. One class of attack walks straight through - and it isn't a bug.

I maintain bulwark-mcp, a small open-source proxy that sits between an MCP client (Claude Desktop, Cursor) and the servers it talks to, and scans tool results for indirect prompt injection before they reach the model.

The reason that's a job worth doing: an MCP-enabled agent reads the output of every tool it calls, and it reads that output as data. A file from disk, an issue body from GitHub, a row from a database, a search snippet from the web — it all flows straight into the model's context. Except sometimes it isn't data. Anyone with write access to one of those surfaces can plant text that looks like data and reads like instructions, and the model does what the text says.

Before telling anyone the detector works, I did the thing you're supposed to do with a security tool: I tried to defeat it. Most of what I threw at it, it caught. One category didn't — and the more I dug, the clearer it got that this isn't a regex I forgot to write. It's a wall the entire field is standing in front of.

Here's the attack, why it works, and what I think it means for anyone building injection defenses.

What the detector actually does

Here's the attack, why it works, and what I think it means for anyone building injection defenses.

What the detector actually does

I tried to break my own MCP prompt-injection detector. One class of attack walks straight through - and it isn't a bug.

I tried to break my own MCP prompt-injection detector. One class of attack walks straight through - and it isn't a bug.

Related reading

I wrote a read-only scanner for MCP / agent-gateway production-readiness

Your MCP Server Is Probably Overprivileged - Here's a Scanner For It

GitHub MCP Security Scanning: How AI Coding Agents Get an Immune System

Testing and Debugging MCP

Auditing MCP Server Security: The Attack Surface Nobody Talks About

The MCP Rug Pull - When the Tool You Trusted Yesterday Becomes Malicious Today

Related reading

I wrote a read-only scanner for MCP / agent-gateway production-readiness

Your MCP Server Is Probably Overprivileged - Here's a Scanner For It

GitHub MCP Security Scanning: How AI Coding Agents Get an Immune System

Testing and Debugging MCP

Auditing MCP Server Security: The Attack Surface Nobody Talks About

The MCP Rug Pull - When the Tool You Trusted Yesterday Becomes Malicious Today