AI + ML
If you want a picture of the future of LLM security, imagine Whac-a-Mole meets Groundhog Day
Researchers say that machine learning models cannot reliably distinguish between authorized and unauthorized input, ensuring that prompt injection will continue to present a threat until developers find new ways to have machine learning systems process inputs.AI models provide responses to user-supplied prompts. The problem is that AI models may receive adversarial prompts – directly from a user or indirectly from an ingested document – that tell the model to take action contrary to its built-in system prompt.Various techniques mitigate prompt injection, but defenders have not found ways to prevent such attacks.
According to independent researchers Charles Ye and Jasmine Cui, and MIT associate professor Dylan Hadfield-Menell, no one is likely to do so under the current fragile LLM security model.
As they observe in a paper titled "Prompt Injection as Role Confusion" in the proceedings of next week's ICML 2026 conference, LLMs have come to rely on a text tagging system that defines "roles" to separate system text from user text. And roles, they argue, do not guarantee security."Role tags were a formatting trick that became the security architecture and the cognitive scaffolding of modern LLMs," the authors explain in a blog post. "We've shown that this architecture doesn't survive into the model's actual representations, and that such role confusion is linked to prompt injection."When OpenAI's ChatGPT arrived in 2022, it implemented the concept of roles – described by Anthropic a year earlier – as a way to tell the underlying model to behave in a certain way. The user role would make a request and the model, acting in the role of a helpful assistant, would respond to that request.











