MCP Server Is Eating Your Context Window. There's a Simpler Way

Integrating AI tools via MCP servers consumes a massive amount of the context window, limiting the model's reasoning ability. A simpler approach using a command-line interface (CLI) is emerging as an effective solution that saves resources and increases reliability.

The problem nobody talks about at demo scale

Here's a scenario that'll feel familiar if you've wired up MCP servers for anything beyond a demo.

You connect GitHub, Slack, and Sentry. Three services, maybe 40 tools total. Before your agent has read a single user message, 55,000 tokens of tool definitions are sitting in the context window. That's over a quarter of Claude's 200k limit. Gone.

It gets worse. Each MCP tool costs 550–1,400 tokens for its name, description, JSON schema, field descriptions, enums, and system instructions. Connect a real API surface, say a SaaS platform with 50+ endpoints, and you're looking at 50,000+ tokens just to describe what the agent could do, with almost nothing left for what it should do.

One team reported three MCP servers consuming 143,000 of 200,000 tokens. That's 72% of the context window burned on tool definitions. The agent had 57,000 tokens left for the actual conversation, retrieved documents, reasoning, and response. Good luck building anything useful in that space.

This isn't a theoretical concern. David Zhang (@dzhng), building Duet, described ripping out their MCP integrations entirely, even after getting OAuth and dynamic client registration working. The tradeoff was impossible:

Load everything up front→ lose working memory for reasoning and historyLimit integrations→ agent can only talk to a few servicesBuild dynamic tool loading→ add latency and middleware complexity

He called it a "trilemma." That feels about right.

And the numbers hold up under controlled testing. A recent benchmark by Scalekit ran 75 head-to-head comparisons (same model, Claude Sonnet 4, same tasks, same prompts) and found MCP costing 4 to 32× more tokens than CLI for identical operations. Their simplest task, checking a repo's language, consumed 1,365 tokens via CLI and 44,026 via MCP. The overhead is almost entirely schema: 43 tool definitions injected into every conversation, of which the agent uses one or two.

Three approaches to the same problem

The industry is converging on three responses to context bloat. Each has a sweet spot.

MCP with compression tricks

The first response is to keep MCP but fight the bloat. Teams compress schemas, use tool search to load definitions on demand, or build middleware that slices OpenAPI specs into smaller chunks.

This works for small, well-defined interactions like looking up an issue, creating a ticket, or fetching a document. MCP's structured tool calls an

Source: Hacker News