I ran my own AI chatbot plugin through a security review before release, and it came back with 35 bugs. Three were critical. The one that made my stomach drop was an HTML injection coming from unsanitized model output.
I had spent all my worry on the input side: prompt injection, the path where a user types a malicious instruction. What actually bit me was the output. The model handed back a string, I treated it as trustworthy, rendered it, and the hole opened right there.
This is a defensive writeup, not an attack guide. It's the three holes I found in my own code and how I closed them, with language-agnostic pseudocode. I build this plugin, so these are my mistakes, not someone else's.
Everyone guards the input. The output leaks.
Prompt injection has been covered to death, and that's good. "The natural-language version of SQL injection" is a framing most developers now carry, and the instinct to distrust the input path has spread.








