I spent two semesters building an AI agent that runs penetration tests. For the non-hackers in the room, penetration tests are basically security assessments where you try to break into a system to find vulnerabilities before someone else does. My project aims to automate this process. It proposes commands, executes them on an isolated virtual machine over SSH, and chains together multi-step attack workflows the same way a human tester would. Every action passes through safety and approval gates before it touches the target. The whole point is governed autonomy: the agent does the work, but the system keeps it honest and safe.
The project is called A.E.G.I.S. and by the end of it the agent autonomously confirmed a critical SQL injection (a way to manipulate a database through user input) against a Hack The Box lab target. But the most useful thing it ever did had nothing to do with finding vulnerabilities in a target. It found bugs within itself.
The thing nobody tells you about building agentic systems
When your AI agent does something wrong, the natural instinct is to add a rule. Agent curling static assets? Add a rule that says never curl static assets. Agent skipping important paths? Add a mandatory action queue that forces it to check them. Agent not following your priority system? Number the priorities P0 through P7 and make them non-negotiable.






