There's a class of bug in modern GenAI products that doesn't have a fix in Martin Fowler and Venkat Subramaniam's nine patterns — prompt injection through a chat interface to a tool. The standard mitigation is to send the user's prompt through another LLM (the "guardrail") that decides whether the prompt is malicious. That guardrail has the same properties as the model it's guarding: it's non-deterministic, hallucination-prone, and can be tricked by the same techniques it's supposed to catch. You've added an unreliable checker to an unreliable system. The probability of catastrophic failure went down. The structural possibility of it did not.
I just finished an integration in the other direction. The AI-agent surface for Stave — the cloud-security reasoning engine I've been building solo — exposes its capabilities via a Model Context Protocol (MCP) server. Agents call typed methods: search, diff, gaps, readiness, compliance. They get back structured data. There is no prompt. There is no free-text channel for the agent to inject into. The "guardrail" is the type system. The problem class of prompt injection is not mitigated. It is structurally unreachable. The architecture doesn't have the surface for the attack to exist.








