I'm a solo developer with about five years of experience, mostly outside AI. The last few months I've been getting serious about it — reading docs, building small things with Claude, learning how it differs from the web APIs I'm used to.
That gap turned into my first published npm package, prompt-cache-optimizer. This post is what I learned about the four ways prompt caching silently fails, and what the package does to catch them.
What prompt caching is supposed to do
When you call messages.create with a long, stable prefix (system prompt, tool definitions, retrieved documents), Anthropic lets you mark a cache_control breakpoint. On the first call, that prefix gets written to the cache at ~1.25x the normal input rate. On any subsequent call within the cache TTL, the cached tokens are read back at 10% of the input rate.
The math is incredible. The execution is finicky.









