My server pushes hints to agents — and the 3 iterations that led there

I avoided MCP from day one. No schema overhead, no token tax. The agent called my GraphQL API directly with a behavior spec and good documentation. I assumed that was enough: clear docs, correct architecture, let the agent figure it out.

It wasn't. The moment that changed my thinking: watching the agent burn 1,500 tokens on a single upload because it kept guessing JSON field formats wrong, reading docs across multiple pages, and retrying. I fixed the docs. The problem resurfaced on different fields. I fixed those too. It kept coming back. CLI wasn't a nice-to-have. It was the only thing that actually stopped the bleeding. And it was only the first of three iterations before I stumbled into the most interesting one: letting the server talk directly to the agent.

Iteration 1: Avoiding MCP from day one

The MCP context tax is well-documented at this point, so I won't belabor it. My API surface covers 34 commands. As MCP tools, that's 34 schemas × ~180 tokens = ~6,120 tokens of constant overhead in every conversation turn, regardless of whether the agent uses them.

I understood this from first principles and chose a different path: a SKILL.md behavior spec + direct GraphQL API calls. No registered tools, no schema overhead. The agent reads the behavior spec once when the skill is invoked, then calls the API via curl.

My server pushes hints to agents — and the 3 iterations that led there

Related reading

MCP Server Design: 3 Principles We Learned in Production

I Ran 6 MCP Servers Behind One Agent. Here's What the Token Bill Actually…

I Measured MCP vs Direct API Calls: The Token Math No One Tells You

MCP in Production: Tool Design, Catalogs, and the Gateway Problem

MCP vs Direct API Calls — My Agent Stack Has Zero MCP Servers

MCP Deep Dive, Part 10: When the Agent Feels Off — Debugging and Observability…