Every AI API call we send with the BrainGrid base agent sends the same tool definitions. Every. Single. Time.

When you're building an AI agent with a dozen tools, each with detailed schemas and descriptions, you're looking at a couple thousand tokens that get sent with every request. It's like paying for coffee whether you drink it or not.

We recently migrated to AI SDK v5 for other reasons (hello, proper TypeScript types and tool streaming), but buried in the release notes was a feature that caught our eye: support for Anthropic's ephemeral caching through providerOptions.

For context, BrainGrid is an AI-powered platform that helps developers turn messy ideas into crystal-clear specs that AI coding assistants can actually implement. We analyze your codebase, ask the right clarifying questions, and break requirements down into atomic, verifiable, AI-ready tasks. Each task becomes a precise prompt with full context, so your AI IDE (Cursor, Claude Code, etc.) gets it right the first time. Behind the scenes, we're orchestrating multiple specialized agents with dozens of tools—which is why this SDK migration touched everything.

The Problem: Why pay for slower performance?