NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
DEV-TOOLS...6 min read

Apideck CLI – An AI-agent interface with much lower context consumption than MCP

Share
NOW LET US Article – Apideck CLI – An AI-agent interface with much lower context consumption than MCP

Integrating tools for AI agents via MCP consumes a massive number of tokens, limiting the model's reasoning capabilities. Apideck CLI emerges as an efficient alternative, using a command-line interface to drastically reduce token consumption and increase reliability.

The problem nobody talks about at demo scale

Here's a scenario that'll feel familiar if you've wired up MCP servers for anything beyond a demo.

You connect GitHub, Slack, and Sentry. Three services, maybe 40 tools total. Before your agent has read a single user message, 55,000 tokens of tool definitions are sitting in the context window. That's over a quarter of Claude's 200k limit. Gone.

It gets worse. Each MCP tool costs 550–1,400 tokens for its name, description, JSON schema, field descriptions, enums, and system instructions. Connect a real API surface, say a SaaS platform with 50+ endpoints, and you're looking at 50,000+ tokens just to describe what the agent could do, with almost nothing left for what it should do.

One team reported three MCP servers consuming 143,000 of 200,000 tokens. That's 72% of the context window burned on tool definitions. The agent had 57,000 tokens left for the actual conversation, retrieved documents, reasoning, and response. Good luck building anything useful in that space.

This isn't a theoretical concern. David Zhang (@dzhng), building Duet, described ripping out their MCP integrations entirely, even after getting OAuth and dynamic client registration working. The tradeoff was impossible:

Load everything up front→ lose working memory for reasoning and historyLimit integrations→ agent can only talk to a few servicesBuild dynamic tool loading→ add latency and middleware complexity

He called it a "trilemma." That feels about right.

And the numbers hold up under controlled testing. A recent benchmark by Scalekit ran 75 head-to-head comparisons (same model, Claude Sonnet 4, same tasks, same prompts) and found MCP costing 4 to 32× more tokens than CLI for identical operations. Their simplest task, checking a repo's language, consumed 1,365 tokens via CLI and 44,026 via MCP. The overhead is almost entirely schema: 43 tool definitions injected into every conversation, of which the agent uses one or two.

Three approaches to the same problem

The industry is converging on three responses to context bloat. Each has a sweet spot.

MCP with compression tricks

The first response is to keep MCP but fight the bloat. Teams compress schemas, use tool search to load definitions on demand, or build middleware that slices OpenAPI specs into smaller chunks.

This works for small, well-defined interactions like looking up an issue, creating a ticket, or fetching a document. MCP's structured tool calls and typed schemas are genuinely useful when you have a tight set of operations that agents use frequently.

But it adds infrastructure. You need a tool registry, search logic, caching, and routing. You're building a service to manage your services. And you're still paying per-tool token costs every time the agent decides it needs a new capability.

Code execution (the Duet approach)

Duet's answer was to treat the agent like a developer with a persistent workspace. When the agent needs a new integration, it reads the API docs, writes code against the SDK, runs it, and saves the script for reuse.

This is powerful for long-lived workspace agents that maintain state across sessions and need complex workflows (loops, conditionals, polling, batch operations). Things that are awkward to express as individual tool calls become natural in code.

The downside: your agent is now writing and executing arbitrary code against production APIs. The safety surface is enormous. You need sandboxing, review mechanisms, and a lot of trust in your agent's judgment.

CLI as the agent interface

The third approach is the one we took. Instead of loading schemas into the context window or letting the agent write integration code, you give it a CLI.

A well-designed CLI is a progressive disclosure system by nature. When a human developer needs to use a tool they haven't touched before, they don't read the entire API reference. They run tool --help, find the subcommand they need, run tool subcommand --help, and get the specific flags for that operation. They pay attention costs proportional to what they actually need.

Agents can do exactly the same thing. And the token economics are dramatically different.

Why CLIs are the pragmatic sweet spot

Progressive disclosure saves tokens

Here's what the Apideck CLI agent prompt looks like. This is the entire thing an AI agent needs in its system prompt:

Use `apideck` to interact with the Apideck Unified API.
Available APIs: `apideck --list`
List resources: `apideck <api> --list`
Operation help: `apideck <api> <resource> <verb> --help`
APIs: accounting, ats, crm, ecommerce, hris, ...
Auth is pre-configured. GET auto-approved. POST/PUT/PATCH prompt (use --yes). DELETE blocked (use --force).
Use --service-id <connector> to target a specific integration.
For clean output: -q -o json

That's ~80 tokens. Compare that to the alternatives:

| Approach | Tokens consumed | When | |---|---|---| | Full OpenAPI spec in context | 30,000–100,000+ | Before first message | | MCP tools (~3,600 per API) | 10,000–50,000+ | Before first message | | CLI agent prompt | ~80 | Before first message | | CLI --help call | ~50–200 | Only when needed |

The agent starts with 80 tokens of guidance and discovers capabilities on demand:

# Level 1: What APIs are available? (~20 tokens output)
$ apideck --list
accounting ats connector crm ecommerce hris ...
# Level 2: What can I do with accounting? (~200 tokens output)
$ apideck accounting --list
Resources in accounting API:
invoices
list GET /accounting/invoices
get GET /accounting/invoices/{id}
create POST /accounting/invoices
delete DELETE /accounting/invoices/{id}
customers
list GET /accounting/customers
...
# Level 3: How do I create an invoice? (~150 tokens output)
$ apideck accounting invoices create --help
Usage: apideck accounting invoices create [flags]
Flags:
--data string JSON request body (or @file.json)
--service-id string Target a specific connector
--yes Skip write confirmation
-o, --output string Output format (json|table|yaml|csv)
...

Each step costs 50–200 tokens, loaded only when the agent decides it needs that information. An agent handling an accounting query might consume 400 tokens total across three --help calls. The same surface through MCP would cost 10,000+ tokens loaded upfront whether the agent uses them or not.

This mirrors how Claude Agent Skills work. Metadata first, full details only when selected, reference material only when needed. The CLI is doing the same thing through a different mechanism.

Scalekit's benchmark independently validated this pattern. They found that even a minimal ~800-token "skills file" (a document of CLI tips and common workflows) reduced tool calls by a third and latency by a third compared to a bare CLI. Our approach takes it further: the ~80-token agent prompt provides the same progressive discovery at a tenth of the cost. The principle is the same. A small, upfront hint about how to navigate the tool is worth more than thousands of tokens of exhaustive schema.

Reliability: local beats remote

There's a dimension of the MCP problem that doesn't get enough attention: availability.

Scalekit's benchmark recorded a 28% failure rate on MCP calls to GitHub's Copilot server. Out of 25 runs, 7 failed with TCP-level connection timeouts. The remote server simply didn't respond in time. Not a protocol error, not a bad tool call. The connection never completed.

CLI agents don't have this failure mode. The binary runs locally. There's no remote server to time out, no connection pool to exhaust, no intermediary to go down. When your agent runs apideck accounting invoices list, it makes a direct HTTPS call to the Apideck API. One hop, not two.

This matters at scale. At 10,000 operations per month, a 28% failure rate means roughly 2,800 retries, each burning additional tokens and latency. Scalekit estimated the monthly cost difference at $3.20 for CLI vs. $104.40 for MCP just on retries.

© 2026 Now Let Us. All rights reserved.

Source: Hacker News

Advertisement
Ad slot ready: 5887729102

More in this category

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.