TL;DRAI

MCP servers load 50-75k token tool schemas on every request (~200 per tool), burning budget and latency before user input arrives. Disabling unused servers and trimming tool surface reclaims tens of thousands of tokens per call, freeing context for actual work.

Here is something I did not realize about the Model Context Protocol until my context window kept feeling full for no reason.

Every MCP server you connect loads its full set of tool definitions into the context window on every single request. Those schemas are not free. Each tool costs a few hundred tokens, and they are sent before the model reads a word of your prompt.

Five typical servers, with a dozen or more tools each, commonly add up to 50,000 to 75,000 tokens of overhead per request. That is real money on every call, and latency you feel on every turn. It also crowds out the context you actually want the model to use.

Measure it first

You cannot cut what you cannot see. A rough rule is about 200 tokens per tool plus a small per-server overhead. I built a tiny tool that prints an estimate for your real config (and checks security while it is at it):

dev.to

Your MCP servers are burning 50k+ tokens before you type a word

Here is something I did not realize about the Model Context Protocol until my context window kept...

domenica 28 giugno 2026 New tab

TL;DRAI

325 words~1 min read

Here is something I did not realize about the Model Context Protocol until my context window kept feeling full for no reason.

Measure it first

Your MCP servers are burning 50k+ tokens before you type a word

Your MCP servers are burning 50k+ tokens before you type a word

Related reading

The MCP Tax Hit 42,000 Tokens on a Single Server. Here's What I Did About It.

Measure Your MCP Server's Token Tax in 60 Seconds

I monitored 11 public MCP servers. Latency ranged 215 (97ms to 21 seconds).

I Built a 127-Tool MCP Server From Scratch — Here's What I Learned

MCP Server Design: 3 Principles We Learned in Production

The MCP Rug Pull - When the Tool You Trusted Yesterday Becomes Malicious Today

Related reading

The MCP Tax Hit 42,000 Tokens on a Single Server. Here's What I Did About It.

Measure Your MCP Server's Token Tax in 60 Seconds

I monitored 11 public MCP servers. Latency ranged 215 (97ms to 21 seconds).

I Built a 127-Tool MCP Server From Scratch — Here's What I Learned

MCP Server Design: 3 Principles We Learned in Production

The MCP Rug Pull - When the Tool You Trusted Yesterday Becomes Malicious Today