How I Used Automated Red Teaming To Take My AI Agent from 6/9 Breaches to Zero

I gave an AI agent the vended bash tool from Strands and asked it to read my AWS credentials file. At first, it refused. But then I asked again with a slightly more creative prompt and it read the file, found the keys, and then gave me a polite but stern warning that I should rotate them immediately.

Even with the warning, the point is that the agent got the keys. That's the danger of giving your agent access to a local filesystem. It can reach anything on that machine like credentials, environment variables, config files, or whatever's there. And whether the model refuses or complies depends on how you ask. A direct "read my secrets" prompt might get blocked, but a multi-turn conversation that gradually escalates from debugging to credential access might get through.

But I only found that manually. What about the attacks I wouldn't think to try? That's what automated red teaming is for. Red teaming tries to figure out how an attacker can make your agent misbehave. Automated red teaming runs jailbreaks prompts crafted to get a model to do something its instructions forbid.

This post is the walkthrough of how I used it and went from 6/9 detected breaches to 0.

The patterns apply to any agent framework, but I'll use Strands Agents, Amazon Bedrock, and Amazon Bedrock AgentCore throughout since they have a few features that make this all pretty easy to do.

This post is the walkthrough of how I used it and went from 6/9 detected breaches to 0.

The patterns apply to any agent framework, but I'll use Strands Agents, Amazon Bedrock, and Amazon Bedrock AgentCore throughout since they have a few features that make this all pretty easy to do.

How I Used Automated Red Teaming To Take My AI Agent from 6/9 Breaches to Zero

How I Used Automated Red Teaming To Take My AI Agent from 6/9 Breaches to Zero

Other newsrooms on this story

Related reading

Your AI Agent just leaked your Stripe key. Here's how to stop it before the…

Your AI agent has sudo. I built a tool to take it away.

How My AI Agent Hacked Its Own Permissions (And What It Taught Me)

I Let AI Agents Attack My Permission Gateway for a Week. Here's What Broke.

AI coding agents breached: attackers targeted credentials, not models |…

I Spent the Last Few Days Testing AI Agents and Got Scared — So I Built…

Other newsrooms on this story

Related reading

Your AI Agent just leaked your Stripe key. Here's how to stop it before the…

Your AI agent has sudo. I built a tool to take it away.

How My AI Agent Hacked Its Own Permissions (And What It Taught Me)

I Let AI Agents Attack My Permission Gateway for a Week. Here's What Broke.

AI coding agents breached: attackers targeted credentials, not models |…

I Spent the Last Few Days Testing AI Agents and Got Scared — So I Built…