Prompt injection is role confusion, and your MCP gateway can't see it

There is a paper that reframes prompt injection in a way that is hard to unsee: Prompt Injection as Role Confusion. Its claim is that the dozens of named attacks (ignore previous instructions, hidden HTML, markdown injection, tool injection, RAG injection) are not different bugs. They are one bug: a model attributes authority by the style of text, not by the structural role tag wrapped around it. Make untrusted text sound like it comes from a privileged source and the model may obey it, regardless of where it actually came from.

For a gateway sitting between an AI client and its MCP tools, that is the whole game. A tool response is supposed to be data. But it can mimic a higher-authority voice, and the model has no reliable way to tell the difference.

The strongest version: forging the reasoning channel

The paper's most striking result is not about user or system messages. It is about the model's own reasoning. When the authors injected text that imitated a model's chain-of-thought, the forged reasoning read with higher "CoTness" than the model's genuine reasoning. Concretely: forging the reasoning channel raised jailbreak success from roughly 0% to roughly 60%, and it transferred across every model they tested. When they "destyled" the injected reasoning (stripping the characteristic phrasing), success dropped back to about 10%.

The strongest version: forging the reasoning channel

Prompt injection is role confusion, and your MCP gateway can't see it

Prompt injection is role confusion, and your MCP gateway can't see it

Other newsrooms on this story

Related reading

Interesting Paper Exploring Prompt Injection - Schneier on Security

Prompt Injection in 2026: Still OWASP's Number One LLM Vulnerability

AI Prompt Injection Defense: Building Effective Strategies in 5 Steps

Indirect Prompt Injection remains a fundamental security challenge for AI |…

I tried to break my own MCP prompt-injection detector. One class of attack…

How to Defend Against Prompt Injection in Production

Other newsrooms on this story

Related reading

Interesting Paper Exploring Prompt Injection - Schneier on Security

Prompt Injection in 2026: Still OWASP's Number One LLM Vulnerability

AI Prompt Injection Defense: Building Effective Strategies in 5 Steps

Indirect Prompt Injection remains a fundamental security challenge for AI |…

I tried to break my own MCP prompt-injection detector. One class of attack…

How to Defend Against Prompt Injection in Production