Hey everyone!

With tools like Cursor, Claude Desktop, and various MCP servers becoming part of our daily workflows, I started worrying a bit about the attack surface of having autonomous, stateful AI agents running locally. What happens if an agent pulls down a poisoned package or executes a malicious tool?

To try and solve this for myself, I built W.H.Agent (White Hat Agent). It’s an open-source CLI and sandboxing tool designed to act as a pre-execution and runtime defense for AI agents.

To be completely honest, it’s still very much a work in progress (the OS-native sandboxing is currently macOS-only, for example), and I’m sure there are edge cases I haven't even thought of yet. But I decided to open-source it today because I genuinely want to see if this approach brings value to other developers.

A few things it currently does: