I avoided MCP from day one. No schema overhead, no token tax. The agent called my GraphQL API directly with a behavior spec and good documentation. I assumed that was enough: clear docs, correct architecture, let the agent figure it out.

It wasn't. The moment that changed my thinking: watching the agent burn 1,500 tokens on a single upload because it kept guessing JSON field formats wrong, reading docs across multiple pages, and retrying. I fixed the docs. The problem resurfaced on different fields. I fixed those too. It kept coming back. CLI wasn't a nice-to-have. It was the only thing that actually stopped the bleeding. And it was only the first of three iterations before I stumbled into the most interesting one: letting the server talk directly to the agent.

Iteration 1: Avoiding MCP from day one

The MCP context tax is well-documented at this point, so I won't belabor it. My API surface covers 34 commands. As MCP tools, that's 34 schemas × ~180 tokens = ~6,120 tokens of constant overhead in every conversation turn, regardless of whether the agent uses them.

I understood this from first principles and chose a different path: a SKILL.md behavior spec + direct GraphQL API calls. No registered tools, no schema overhead. The agent reads the behavior spec once when the skill is invoked, then calls the API via curl.