I cut an AI agent's input tokens by 71% and quality held — here's the 66-task benchmark

I cut a coding agent's input tokens by 71% — from 5.07M down to 1.46M across a 66-task run — and quality stayed within model noise (63 vs 64 of 66 tasks solved).

This post is the benchmark, the failures, the honesty caveats, and the code. No "revolutionary." Just numbers you can reproduce.

TL;DR

What it is: tokdiet — a local streaming reverse proxy that sits between your agent tools (Claude Code, Cursor, Codex, custom scripts) and the model APIs (Anthropic, OpenAI, Gemini, MiniMax, anything OpenAI/Anthropic-compatible).

The headline number: input tokens 5.07M → 1.46M = -71%; quality 63/66 vs 64/66 baseline ≈ parity. 198 paired runs. LLM-judge reports 92% similarity. Confirmed on a 2nd model at -72%.

I cut a coding agent's input tokens by 71% — from 5.07M down to 1.46M across a 66-task run — and quality stayed within model noise (63 vs 64 of 66 tasks solved).

This post is the benchmark, the failures, the honesty caveats, and the code. No "revolutionary." Just numbers you can reproduce.

TL;DR

The headline number: input tokens 5.07M → 1.46M = -71%; quality 63/66 vs 64/66 baseline ≈ parity. 198 paired runs. LLM-judge reports 92% similarity. Confirmed on a 2nd model at -72%.

I cut an AI agent's input tokens by 71% and quality held — here's the 66-task benchmark

Other newsrooms on this story

I cut an AI agent's input tokens by 71% and quality held — here's the 66-task benchmark

Other newsrooms on this story

Related reading

I A/B tested an MCP server that cut my Claude Code token cost

Lessons from a 109-agent code audit workflow

AI coding getting pricier? I cut my tokens by 82% (with real data)

I A/B tested compressed agent instructions and found the breaking point

How we reduced coding-agent token usage by 17.9% with an MCP server

Benchmarking inference at scale: coding agents

Related reading

I A/B tested an MCP server that cut my Claude Code token cost

Lessons from a 109-agent code audit workflow

AI coding getting pricier? I cut my tokens by 82% (with real data)

I A/B tested compressed agent instructions and found the breaking point

How we reduced coding-agent token usage by 17.9% with an MCP server

Benchmarking inference at scale: coding agents