When MCP launched in late 2024, Anthropic described it as a USB-C port for AI applications. The industry responded by racing to ship MCP servers — Linear, Notion, Slack, GitHub, Postgres, and hundreds more. Every SaaS landing page added MCP supported to its feature list.
A year and a half later, the pendulum has swung hard in the other direction. Engineers are publishing benchmarks showing MCP consuming 10%+ of context windows, adding 3-9x latency, and duplicating functionality that existing CLIs already handle better. MCP is dead headlines are multiplying.
Neither the breathless hype nor the angry backlash tells the full story. From the measurements the community has published, I see a layered integration model where CLIs, Skills, and MCP each solve distinct problems — and the teams shipping the fastest AI workflows are using all three, not betting on one.
Why This Debate Matters Now
The stakes aren’t academic. The way you connect your LLM to tools determines:
- How much of your context window is available for actual work
- How quickly your agent can act on a request
- How easy it is to debug when something goes wrong
- How composable your toolchain is across human and AI workflows
In 2024, the default answer was MCP server. In mid-2026, that answer is looking naive. The protocol hasn’t stood still — Anthropic’s Tool Search with deferred loading cut context overhead by 85%+ according to published measurements — but the fundamental architectural questions remain.
The Three Approaches
The debate has three real contenders, not two.
CLI / Bash
The oldest and most proven. LLMs already understand CLI tools — they’ve trained on millions of man pages, Stack Overflow answers, and shell scripts. When an agent runs gh pr view 123 or jq '.items[] | select(.status == "open")', it’s using the same interface a human would. No new protocol. No background process. No context overhead.
MCP (Model Context Protocol)
A standardized JSON-RPC protocol for connecting LLMs to external tools. MCP servers define tools with names, descriptions, and parameter schemas. The LLM calls these tools through a client (Claude Code, ChatGPT, VS Code). The promise: one integration pattern for everything. The cost: tool definitions consume context, every call goes through an extra process hop, and the server infrastructure has moving parts.
Skills (Claude Code’s Skill System)
Anthropic’s less-hyped alternative. Skills are structured instruction files loaded on-demand into context. They embed CLI commands, API calls, and workflow instructions that the LLM executes directly. Unlike MCP, Skills don’t load tool definitions upfront — they only occupy context when invoked. A Skill for Linear might contain curl commands and jq parsing instructions, giving the LLM everything it needs at negligible context cost compared to pre-loading full MCP tool definitions.
The Numbers: What Each Approach Costs
Context Consumption
The backlash started with numbers. Quandri Engineering’s May 2026 measurements are the most concrete I’ve seen:
Tool definitions loaded with 4 MCP servers:
| MCP Server | Tools | Approx. Tokens |
|---|---|---|
| Linear | 42 | ~12,807 |
| Notion | 14 | ~4,039 |
| Slack | 12 | ~3,792 |
| Postgres | 9 | ~438 |
| Total | 77 | ~21,077 |
That’s 10.5% of Claude’s 200K context window and 16.5% of GPT-4o’s 128K window — consumed before a single tool call is made. Linear alone accounts for 12,807 tokens of tool schemas, even if you only ever use two tools from it.
The same operation via CLI vs MCP:
To look up a Linear issue:
| Approach | Token Cost |
|---|---|
| CLI (curl + GraphQL query) | ~200 tokens total |
| MCP (tool definitions + call) | ~12,957 tokens total |
That’s ~65x more tokens for the MCP path. The overhead comes from pre-loading 42 tool definitions you don’t need.
Caveat: Anthropic’s Tool Search (deferred loading) changes this picture. With ENABLE_TOOL_SEARCH=true (default in current Claude Code), tool schemas load on-demand rather than upfront. Published measurements show this reduces context usage by 85%+. The “10% of context” problem is largely solved for users on current versions.
But deferred loading introduces a new cost: the search step. Claude must now discover which tools exist before calling them, adding a round-trip. A tradeoff — and a reminder the protocol is still maturing.
Latency
The Quandri Engineering benchmarks, building on ejholmes’ analysis , showed Jira MCP being 3x slower per call than direct API access, and 9.4x slower on first call due to server initialization.
MCP adds an extra process between the LLM and the target API. Every tool call goes through:
1. LLM → Client → MCP Server → API → MCP Server → Client → LLM
Instead of the CLI path:
1. LLM → CLI binary → stdout → LLM
The extra hop matters less for long-running operations (database queries, file writes) and more for rapid, iterative tool use — which is exactly what agents do most.
Reliability
MCP servers are processes. They fail to start, they crash mid-session, they require re-authentication. From the Quandri post: Init failure, repeated re-auth and mid-session tool death are cited as daily friction.
Claude Code documentation confirms this: stdio servers don’t auto-reconnect after failure. HTTP/SSE servers get five retry attempts with exponential backoff, but after that they’re marked as failed. For a developer in the middle of a complex task, an MCP server failure means restarting the session.
CLIs have no such moving parts. They’re binaries on disk. They execute when called and return when done. No background state to manage.
The Comparison Dimensions
Performance
| Dimension | CLI | Skills | MCP |
|---|---|---|---|
| Context overhead | 0 tokens | Loaded on demand (minimal token overhead) | ~21K tokens for 4 servers (with Tool Search: ~85% less) |
| Call latency | Direct execution | Direct execution | 3-9x overhead on first call |
| Composability | Pipes, jq, grep, redirect | CLI-based, inherits pipeability | JSON-RPC responses, server-constrained |
| Reliability | Binary on disk, no moving parts | None (in-context instructions) | Server process dependency |
Developer Experience
| Dimension | CLI | Skills | MCP |
|---|---|---|---|
| Debugging | Run same command in terminal | Run same command in terminal | Inspect JSON transport logs |
| Auth | Existing flows (gh auth, aws sso, kubeconfig) | Inherits CLI auth | Per-server auth, OAuth flow, re-auth cycles |
| Permissions | Granular: allowlist specific commands | Granular: allowlist specific skill content | All-or-nothing: allowlist by tool name only |
| Setup cost | Already installed (gh, aws, psql) | Add skill file to | Install server, configure auth, manage process lifecycle |
Safety & Security
| Dimension | CLI | Skills | MCP |
|---|---|---|---|
| Query validation | None (LLM writes raw SQL) | None (LLM writes raw SQL) | Server-level read-only enforcement possible |
| Credential exposure | In prompt or env vars | In prompt or env vars | Managed server-side |
| Attack surface | Existing tool surface | None (no new processes) | Server process per integration, prompt injection risk |
This is the one dimension where MCP has a genuine advantage for production deployments. An MCP Postgres server can enforce read-only mode, block DROP TABLE, and keep credentials out of the prompt. A CLI approach cannot — the LLM either has full psql access or none.
Ecosystem Maturity
MCP has the broadest ecosystem. The Anthropic Directory lists hundreds of reviewed connectors. ChatGPT, VS Code, Cursor, and Zed all support MCP. The ecosystem effect is real: if you need to connect to an obscure SaaS tool, MCP probably has a connector and a CLI probably doesn’t.
Skills are Claude Code-only. CLIs are everywhere, but require the LLM to know the correct flags and syntax. This matters less than it used to — modern LLMs handle CLI syntax well.
Decision Matrix: Which Approach for Which Scenario?
| Scenario | Best Fit | Why |
|---|---|---|
| Local dev / personal DB | CLI + Skills | Zero context cost, composable, easy to debug. Skills embed the psql commands. |
| Production DB / shared team | MCP | Read-only enforcement at server level. Credential management matters. |
| Services with mature CLIs (GitHub, AWS, Kubernetes) | CLI |
|
| Services without CLIs (Slack, Notion, Linear) | MCP | No better option exists. But prefer Skills + HTTP API when possible. |
| Complex multi-step workflows (commit → review → deploy) | Skills | On-demand loading, composable CLI pipeline steps, easy to maintain. |
| Event-driven reactivity (monitoring alerts, webhook responses) | MCP Channels | MCP’s bidirectional capability (channels) has no CLI equivalent. |
| Non-developer users | MCP | Terminal-free, abstracted auth, graphical tool selection. |
| Rapid prototyping, one-off data transformation | CLI | DuckDB, jq, grep — already installed, already known. |
Key insight: The teams I’ve seen ship the fastest AI workflows aren’t picking one approach. They’re layering them.
The Synthesis: Three Layers, Not One Protocol
- Layer 1 — CLI for the basics. The tools engineers already use:
gh,aws,kubectl,psql,docker. Zero integration cost. The LLM already knows them. When something breaks, you can reproduce it in the terminal immediately. - Layer 2 — Skills for the workflows. Structured instruction files that wrap CLI commands with context-specific guidance. A “Deploy to staging” skill embeds the exact
awsandkubectlcommands, the environment variables, and the validation steps. It loads on demand and costs negligible context. - Layer 3 — MCP for the gaps. Services without CLIs (Slack, Notion, Figma). Production databases where safety guardrails matter. Event-driven channels where the server pushes messages into the session.
This is what the Quandri team actually runs. Their stack: “Bash + CLI for tools we already use day-to-day. Skills for repeatable multi-step workflows. MCP for services without a strong CLI, and where team-wide auth or permission scoping matters.”
The Verdict
MCP isn’t dead. But the one protocol to rule them all vision is. And that’s healthy.
The backlash has been productive because it forced the ecosystem to reckon with real tradeoffs: context is scarce, latency matters, and every extra abstraction layer costs something. The response — Tool Search, deferred loading, server-level tool timeouts, managed configuration — shows Anthropic is listening. But it also shows that MCP is still catching up to problems that CLIs never had.
If you’re choosing today:
- Start with CLI. If a command-line tool already exists and your LLM can use it, adding MCP is just another abstraction with no benefit. The token savings alone are worth it.
- Add Skills for repeatable, multi-step workflows. They’re the most efficient way to give an LLM domain-specific guidance without burning context.
- Use MCP where it uniquely helps. No CLI? Need read-only database enforcement? Want event-driven reactivity? MCP earns its keep there.
The teams that optimize for minimum abstraction and maximum debuggability will ship faster. The protocol wars are a distraction. The real question is how simple you can keep it.
Further Reading
- MCP is dead (Quandri Engineering) — The original measurement-driven critique with token-level analysis of MCP context consumption
- MCP is dead. Long live the CLI (Eric Holmes) — The philosophical case for CLI-first, with practical examples of composability and debugging advantages
- Connect Claude Code to tools via MCP (Anthropic Docs) — Official documentation covering Tool Search, deferred loading, channels, and managed configuration
- Model Context Protocol Introduction — The official MCP specification and ecosystem overview
- Introducing the Model Context Protocol (Anthropic) — The original announcement with the USB-C framing and ecosystem vision
No comments yet