What Happened
Last month, the UK Government's AI Safety Institute merged AgentThreatBench into their official inspect_evals framework — the same framework they use to evaluate frontier AI models from OpenAI, Anthropic, and Google DeepMind.
AgentThreatBench is an open-source adversarial benchmark I built that contains 200+ attack payloads specifically designed to test whether AI agents can resist memory poisoning attacks.
Why This Matters
AI agents are increasingly being deployed with persistent memory — they remember past conversations, user preferences, and context across sessions. This creates a new attack surface: memory poisoning.






