I built a simple AI proxy to cut API costs — here's what I learned

A few months ago, my OpenAI API bill suddenly jumped from a modest $30 to over $150 in one month. I wasn't even doing anything crazy — just running a small Slack bot that answered questions about our internal docs. But between repeated prompts, failed retries, and my own debugging queries, the tokens added up fast.

I tried the obvious fixes first: adding client-side caching, switching to gpt-3.5-turbo from gpt-4, and even imposing manual rate limits on myself. None of it stuck. Caching exact prompts doesn’t work when users ask the same question but rephrase it slightly. And rate limits just made the bot feel sluggish.

So I built a lightweight AI proxy — a thin middleware layer between my app and the LLM provider. It wasn't flashy, but it immediately stopped the bleeding. Here’s the honest story of what I did, what I broke along the way, and what I’d do differently next time.

What I tried (and what didn’t work)

Client-side caching

I built a simple AI proxy to cut API costs — here's what I learned

Related reading

I Cut My AI API Bill from $420 to $28/Month — Here's Exactly How

How I Cut My AI API Bill by 40% Without Changing a Single Line of Application…

How I Slashed My AI API Bill by 95% — A Practical Guide for 2026

I Tracked My AI API Costs for 30 Days. The Results Changed How I Build.

How I Cut My AI API Costs by 70% Without Sacrificing Quality

I Was Spending €50/Month on AI APIs — Now It's €5. Here's the Real Math.