Security researchers tricked LLMs into giving them cocaine recipes by abusing role models for prompt injection

AI + ML

If you want a picture of the future of LLM security, imagine Whac-a-Mole meets Groundhog Day

Researchers say that machine learning models cannot reliably distinguish between authorized and unauthorized input, ensuring that prompt injection will continue to present a threat until developers find new ways to have machine learning systems process inputs.AI models provide responses to user-supplied prompts. The problem is that AI models may receive adversarial prompts – directly from a user or indirectly from an ingested document – that tell the model to take action contrary to its built-in system prompt.Various techniques mitigate prompt injection, but defenders have not found ways to prevent such attacks.

According to independent researchers Charles Ye and Jasmine Cui, and MIT associate professor Dylan Hadfield-Menell, no one is likely to do so under the current fragile LLM security model.

As they observe in a paper titled "Prompt Injection as Role Confusion" in the proceedings of next week's ICML 2026 conference, LLMs have come to rely on a text tagging system that defines "roles" to separate system text from user text. And roles, they argue, do not guarantee security."Role tags were a formatting trick that became the security architecture and the cognitive scaffolding of modern LLMs," the authors explain in a blog post. "We've shown that this architecture doesn't survive into the model's actual representations, and that such role confusion is linked to prompt injection."When OpenAI's ChatGPT arrived in 2022, it implemented the concept of roles – described by Anthropic a year earlier – as a way to tell the underlying model to behave in a certain way. The user role would make a request and the model, acting in the role of a helpful assistant, would respond to that request.

AI + ML

If you want a picture of the future of LLM security, imagine Whac-a-Mole meets Groundhog Day

According to independent researchers Charles Ye and Jasmine Cui, and MIT associate professor Dylan Hadfield-Menell, no one is likely to do so under the current fragile LLM security model.

Security researchers tricked LLMs into giving them cocaine recipes by abusing role models for prompt injection

Security researchers tricked LLMs into giving them cocaine recipes by abusing role models for prompt injection

Other newsrooms on this story

Related reading

Prompt injection is exploiting enterprise AI's biggest design flaws by…

AI researchers trick chatbots into sharing how to make cocaine as long as they…

Interesting Paper Exploring Prompt Injection - Schneier on Security

The Safety Feature That Taught an LLM to Lie

Prompt Injection in 2026: Still OWASP's Number One LLM Vulnerability

How I Built an LLM Honeypot to Trap Prompt Injection Attacks

Related reading

Prompt injection is exploiting enterprise AI's biggest design flaws by…

AI researchers trick chatbots into sharing how to make cocaine as long as they…

Interesting Paper Exploring Prompt Injection - Schneier on Security

The Safety Feature That Taught an LLM to Lie

Prompt Injection in 2026: Still OWASP's Number One LLM Vulnerability

How I Built an LLM Honeypot to Trap Prompt Injection Attacks

Other newsrooms on this story