The experiment was simple: point Claude Code at a real project, give it a task list, stand back for 24 hours, and document what actually came back. No hand-holding. No mid-session prompts. Just a system prompt, a tool set, and an instruction file that spelled out what I wanted done by morning.

What came back was not what I expected. Some of it was better. Some of it was a mess that required a full afternoon to untangle. All of it was useful data for anyone trying to build persistent, autonomous agent workflows.

The Setup

The project was a Python-based recon automation tool I had been working on incrementally. Backend was functional but the codebase was disorganized, the output formatting was inconsistent, and I had a backlog of about 15 documented issues ranging from minor refactors to one genuinely unpleasant bug in the rate-limiting logic.

Claude Code ran inside a tmux session on a headless Ubuntu VPS. The CLAUDE.md file at the project root defined the task priority order, which directories were off-limits, what the output format for completed tasks should look like, and a hard rule: if it encountered something that required a decision with more than two plausible outcomes, it should stop and write a BLOCKED.md file describing the ambiguity rather than picking arbitrarily.