Building Identity-Gated Refusal Tiers for AI Security Tools

For thirty years the math has favored the attacker. He needs one bug. You have to cover everything, forever, on a smaller budget with a tired SOC. Now both sides get an AI multiplier, and the only question that matters is who gets it first and biggest. OpenAI's answer is a design pattern worth stealing: stop reading intent from the prompt, read it from the user.

The problem: guardrails resolve on shape, not intent

Here's the failure mode anyone doing defensive work against a frontier model already knows. You ask the model to build a proof-of-concept from a published CVE so you can validate that your patch holds. You own the box. You're confirming a fix. The model tells you it can't help you write an exploit.

It's not reading your heart. It's reading your tokens. And the token sequence for "write a PoC for this CVE" is identical whether you're a defender confirming remediation or an attacker building a weapon. The classifier sees the shape of the request and the shape is the same.

So the fix isn't a smarter classifier. A smarter classifier still only has the prompt to go on, and the prompt doesn't carry intent. The fix is to move the trust signal off the prompt and onto the authenticated principal.

The problem: guardrails resolve on shape, not intent

Building Identity-Gated Refusal Tiers for AI Security Tools

Building Identity-Gated Refusal Tiers for AI Security Tools

Related reading

Defense in Depth for an Agent That Will Definitely Screw Up

Turning Your AI Into an Adversarial Security Agent: The SKILLS.md Framework

Google ADK Security: 5 Layers That Defend AI Agents From Prompt Injection

the guardrails i actually use with ai agents

Reverse once, run forever: designing client-side defenses that assume the…

Wake-Up Call: Why AI Safety Guardrails Break Under Pressure

Related reading

Defense in Depth for an Agent That Will Definitely Screw Up

Turning Your AI Into an Adversarial Security Agent: The SKILLS.md Framework

Google ADK Security: 5 Layers That Defend AI Agents From Prompt Injection

the guardrails i actually use with ai agents

Reverse once, run forever: designing client-side defenses that assume the…

Wake-Up Call: Why AI Safety Guardrails Break Under Pressure