If you're building AI agents or running LLM pipelines in production, you already know the pain: tool outputs, logs, RAG chunks, and conversation history pile up fast. Before you know it, you're burning through tokens at a rate that makes your billing dashboard uncomfortable to look at.
Headroom is an open-source project that tackles this problem directly. It compresses everything your AI agent reads — before it ever reaches the LLM — and claims 60–95% token reduction on real workloads, with accuracy preserved.
The Core Idea
Headroom sits as a layer between your application and the LLM provider. It takes whatever your agent was about to send — a stack of tool call results, a long log file, a RAG retrieval dump — and compresses it using one of several strategies depending on the content type:
SmartCrusher handles JSON (arrays, nested objects, mixed types)








