My Anthropic bill doubled two months in a row. Not because I was building something bigger — because I kept asking the same questions, sending bloated prompts, and defaulting to Sonnet for tasks that Haiku could handle. I built a tool to fix it. Here's how it works.
The Problem
AI API costs compound fast for three reasons. First, if you're iterating on a project, you ask similar questions repeatedly — "how does X work," "what's wrong with this code" — and pay full price every time. Second, prompts accumulate context: documentation snippets, error traces, boilerplate instructions that add hundreds of tokens but contribute nothing to the answer. Third, most people just use whatever model they defaulted to first. Claude Opus at $15/1M input tokens for a query that Haiku could answer for $1/1M is a 15x cost multiplier on every single call.
The Solution
I built ai-cost-optimizer — a local CLI that sits between your terminal and the Anthropic API. It runs a semantic cache, a prompt compressor, and a model router on every request before anything hits the network. No cloud, no subscription, no data leaving your machine. Just a Python package you install once.








