We run a self-healing AI agent system (Kaizen Harness — open source, GitHub). Council debates on architecture, daily tech scans, trajectory logging, automated patching. Tokens add up fast. After a month of tuning, we cut costs 60% with zero quality loss. Here are the patterns that moved the needle, from biggest impact down.

1. Context engineering: stop re-reading your own history

This was the single biggest win. Our agents were burning 40-50% of tokens re-parsing conversation history that hadn't changed since turn 3. The fix, derived from production patterns used by Manus and Cognition:

Append-only design. Every agent response starts with a [STATUS] header that replaces the full history recap. Goal, completed steps, next step. Three lines.

[STATUS] Building PR auto-review pattern. Step 2/4 complete (diff parser done). Next: wire council debate.