My AI Agent Found a Bug in Its Own System

I spent two semesters building an AI agent that runs penetration tests. For the non-hackers in the room, penetration tests are basically security assessments where you try to break into a system to find vulnerabilities before someone else does. My project aims to automate this process. It proposes commands, executes them on an isolated virtual machine over SSH, and chains together multi-step attack workflows the same way a human tester would. Every action passes through safety and approval gates before it touches the target. The whole point is governed autonomy: the agent does the work, but the system keeps it honest and safe.

The project is called A.E.G.I.S. and by the end of it the agent autonomously confirmed a critical SQL injection (a way to manipulate a database through user input) against a Hack The Box lab target. But the most useful thing it ever did had nothing to do with finding vulnerabilities in a target. It found bugs within itself.

The thing nobody tells you about building agentic systems

When your AI agent does something wrong, the natural instinct is to add a rule. Agent curling static assets? Add a rule that says never curl static assets. Agent skipping important paths? Add a mandatory action queue that forces it to check them. Agent not following your priority system? Number the priorities P0 through P7 and make them non-negotiable.

My AI Agent Found a Bug in Its Own System

Related reading

My AI pentest agent reported 23 root shells. It had actually popped zero.

I built an AI that pentests my AI — and forced it to prove every exploit

Fake Bug Report Hijacks AI Coding Agents at Scale

How My AI Agent Hacked Its Own Permissions (And What It Taught Me)

How an AI Agent Taught Me What "Passing Tests" Actually Means

I Spent the Last Few Days Testing AI Agents and Got Scared — So I Built…