Headroom: Cut Your LLM Token Usage by Up to 95% Without Changing Your Answers

If you're building AI agents or running LLM pipelines in production, you already know the pain: tool outputs, logs, RAG chunks, and conversation history pile up fast. Before you know it, you're burning through tokens at a rate that makes your billing dashboard uncomfortable to look at.

Headroom is an open-source project that tackles this problem directly. It compresses everything your AI agent reads — before it ever reaches the LLM — and claims 60–95% token reduction on real workloads, with accuracy preserved.

The Core Idea

Headroom sits as a layer between your application and the LLM provider. It takes whatever your agent was about to send — a stack of tool call results, a long log file, a RAG retrieval dump — and compresses it using one of several strategies depending on the content type:

SmartCrusher handles JSON (arrays, nested objects, mixed types)

The Core Idea

SmartCrusher handles JSON (arrays, nested objects, mixed types)

Headroom: Cut Your LLM Token Usage by Up to 95% Without Changing Your Answers

Headroom: Cut Your LLM Token Usage by Up to 95% Without Changing Your Answers

Related reading

Your LLM Bill Is Exploding Because of Architecture, Not Pricing -- Here's the…

10 Ways To Reduce Your LLM API Costs

Reducing LLM Costs: Best Practices and Techniques

Cut your LLM bill by 30 to 70%: the levers that work

Netflix Headroom: How to Cut AI Agent Costs 10x in Production [2026]

12 Engineering Habits That Cut LLM Token Spend at Production Scale

Related reading

Your LLM Bill Is Exploding Because of Architecture, Not Pricing -- Here's the…

10 Ways To Reduce Your LLM API Costs

Reducing LLM Costs: Best Practices and Techniques

Cut your LLM bill by 30 to 70%: the levers that work

Netflix Headroom: How to Cut AI Agent Costs 10x in Production [2026]

12 Engineering Habits That Cut LLM Token Spend at Production Scale