Don't Wrap the LLM. Make Its Failure Modes Unreachable.

There's a class of bug in modern GenAI products that doesn't have a fix in Martin Fowler and Venkat Subramaniam's nine patterns — prompt injection through a chat interface to a tool. The standard mitigation is to send the user's prompt through another LLM (the "guardrail") that decides whether the prompt is malicious. That guardrail has the same properties as the model it's guarding: it's non-deterministic, hallucination-prone, and can be tricked by the same techniques it's supposed to catch. You've added an unreliable checker to an unreliable system. The probability of catastrophic failure went down. The structural possibility of it did not.

I just finished an integration in the other direction. The AI-agent surface for Stave — the cloud-security reasoning engine I've been building solo — exposes its capabilities via a Model Context Protocol (MCP) server. Agents call typed methods: search, diff, gaps, readiness, compliance. They get back structured data. There is no prompt. There is no free-text channel for the agent to inject into. The "guardrail" is the type system. The problem class of prompt injection is not mitigated. It is structurally unreachable. The architecture doesn't have the surface for the attack to exist.

Don't Wrap the LLM. Make Its Failure Modes Unreachable.

Related reading

Stop Prompting and Start Engineering: Treating LLMs as Unreliable Functions

The Safety Feature That Taught an LLM to Lie

LLM Guardrails Explained: Prompt Injection, PII Detection & Content Moderation

Why your AI agent needs deterministic guardrails (and how to add one in a few…

The Hidden Overthinking Flaw That Could Drag AI Services Down

LLM Agent Guardrails: The Engineering Playbook for Taking an 8B Local Model…