This is a submission for the Hermes Agent Challenge.

I built a Hermes agent last week that takes a customer support email, decides whether it needs a refund, and either issues one or escalates to a human. Standard stuff. The agent worked. The problem started the moment I turned on audit logging.

Every run wrote a JSONL row to disk. Every row contained the full inbound message, the tool calls, the tool outputs, and the final reply. Within an hour the log had:

41 customer email addresses

7 partial credit card numbers (people paste them into support tickets, then apologize)