A recall reference built from working through a 7-layer prompt-injection challenge. Focus: how each defense layer works, where it breaks, and most importantly how to defend.

The one idea underneath everything

LLMs have no hard boundary between instructions and data. Everything in the context window — system prompt, user message, retrieved documents — is one stream of tokens the model interprets. Prompt injection exploits exactly this: attacker-controlled data gets read as instructions. You cannot fully filter your way out of it; you manage it with defense-in-depth, knowing each individual layer is bypassable.

The defense layers (and where each cracks)

A progression of controls from weakest to strongest, each with the lesson it teaches.