Engineering

Abstraction Tax: Measuring What MCP Costs Against CLI and Skills

When MCP launched in late 2024, Anthropic described it as a USB-C port for AI applications. The industry responded by racing to ship MCP servers — Linear, Notion, Slack, GitHub, Postgres, and hundreds more. Every SaaS landing page added MCP supported to its feature list.

A year and a half later, the pendulum has swung hard in the other direction. Engineers are publishing benchmarks showing MCP consuming 10%+ of context windows, adding 3-9x latency, and duplicating functionality that existing CLIs already handle better. MCP is dead headlines are multiplying.

Neither the breathless hype nor the angry backlash tells the full story. From the measurements the community has published, I see a layered integration model where CLIs, Skills, and MCP each solve distinct problems — and the teams shipping the fastest AI workflows are using all three, not betting on one.

Why This Debate Matters Now

The stakes aren’t academic. The way you connect your LLM to tools determines:

  • How much of your context window is available for actual work
  • How quickly your agent can act on a request
  • How easy it is to debug when something goes wrong
  • How composable your toolchain is across human and AI workflows

In 2024, the default answer was MCP server. In mid-2026, that answer is looking naive. The protocol hasn’t stood still — Anthropic’s Tool Search with deferred loading cut context overhead by 85%+ according to published measurements — but the fundamental architectural questions remain.

The Three Approaches

The debate has three real contenders, not two.

CLI / Bash

The oldest and most proven. LLMs already understand CLI tools — they’ve trained on millions of man pages, Stack Overflow answers, and shell scripts. When an agent runs gh pr view 123 or jq '.items[] | select(.status == "open")', it’s using the same interface a human would. No new protocol. No background process. No context overhead.

MCP (Model Context Protocol)

A standardized JSON-RPC protocol for connecting LLMs to external tools. MCP servers define tools with names, descriptions, and parameter schemas. The LLM calls these tools through a client (Claude Code, ChatGPT, VS Code). The promise: one integration pattern for everything. The cost: tool definitions consume context, every call goes through an extra process hop, and the server infrastructure has moving parts.

Skills (Claude Code’s Skill System)

Anthropic’s less-hyped alternative. Skills are structured instruction files loaded on-demand into context. They embed CLI commands, API calls, and workflow instructions that the LLM executes directly. Unlike MCP, Skills don’t load tool definitions upfront — they only occupy context when invoked. A Skill for Linear might contain curl commands and jq parsing instructions, giving the LLM everything it needs at negligible context cost compared to pre-loading full MCP tool definitions.

The Numbers: What Each Approach Costs

Context Consumption

The backlash started with numbers. Quandri Engineering’s May 2026 measurements are the most concrete I’ve seen:

Tool definitions loaded with 4 MCP servers:

MCP Server

Tools

Approx. Tokens

Linear

42

~12,807

Notion

14

~4,039

Slack

12

~3,792

Postgres

9

~438

Total

77

~21,077

That’s 10.5% of Claude’s 200K context window and 16.5% of GPT-4o’s 128K window — consumed before a single tool call is made. Linear alone accounts for 12,807 tokens of tool schemas, even if you only ever use two tools from it.

The same operation via CLI vs MCP:

To look up a Linear issue:

Approach

Token Cost

CLI (curl + GraphQL query)

~200 tokens total

MCP (tool definitions + call)

~12,957 tokens total

That’s ~65x more tokens for the MCP path. The overhead comes from pre-loading 42 tool definitions you don’t need.

Caveat: Anthropic’s Tool Search (deferred loading) changes this picture. With ENABLE_TOOL_SEARCH=true (default in current Claude Code), tool schemas load on-demand rather than upfront. Published measurements show this reduces context usage by 85%+. The “10% of context” problem is largely solved for users on current versions.

But deferred loading introduces a new cost: the search step. Claude must now discover which tools exist before calling them, adding a round-trip. A tradeoff — and a reminder the protocol is still maturing.

Latency

The Quandri Engineering benchmarks, building on ejholmes’ analysis , showed Jira MCP being 3x slower per call than direct API access, and 9.4x slower on first call due to server initialization.

MCP adds an extra process between the LLM and the target API. Every tool call goes through:

1. LLM → Client → MCP Server → API → MCP Server → Client → LLM

Instead of the CLI path:

1. LLM → CLI binary → stdout → LLM

The extra hop matters less for long-running operations (database queries, file writes) and more for rapid, iterative tool use — which is exactly what agents do most.

Reliability

MCP servers are processes. They fail to start, they crash mid-session, they require re-authentication. From the Quandri post: Init failure, repeated re-auth and mid-session tool death are cited as daily friction.

Claude Code documentation confirms this: stdio servers don’t auto-reconnect after failure. HTTP/SSE servers get five retry attempts with exponential backoff, but after that they’re marked as failed. For a developer in the middle of a complex task, an MCP server failure means restarting the session.

CLIs have no such moving parts. They’re binaries on disk. They execute when called and return when done. No background state to manage.

The Comparison Dimensions

Performance

Dimension

CLI

Skills

MCP

Context overhead

0 tokens

Loaded on demand (minimal token overhead)

~21K tokens for 4 servers (with Tool Search: ~85% less)

Call latency

Direct execution

Direct execution

3-9x overhead on first call

Composability

Pipes, jq, grep, redirect

CLI-based, inherits pipeability

JSON-RPC responses, server-constrained

Reliability

Binary on disk, no moving parts

None (in-context instructions)

Server process dependency

Developer Experience

Dimension

CLI

Skills

MCP

Debugging

Run same command in terminal

Run same command in terminal

Inspect JSON transport logs

Auth

Existing flows (gh auth, aws sso, kubeconfig)

Inherits CLI auth

Per-server auth, OAuth flow, re-auth cycles

Permissions

Granular: allowlist specific commands

Granular: allowlist specific skill content

All-or-nothing: allowlist by tool name only

Setup cost

Already installed (gh, aws, psql)

Add skill file to .claude/skills/

Install server, configure auth, manage process lifecycle

Safety & Security

Dimension

CLI

Skills

MCP

Query validation

None (LLM writes raw SQL)

None (LLM writes raw SQL)

Server-level read-only enforcement possible

Credential exposure

In prompt or env vars

In prompt or env vars

Managed server-side

Attack surface

Existing tool surface

None (no new processes)

Server process per integration, prompt injection risk

This is the one dimension where MCP has a genuine advantage for production deployments. An MCP Postgres server can enforce read-only mode, block DROP TABLE, and keep credentials out of the prompt. A CLI approach cannot — the LLM either has full psql access or none.

Ecosystem Maturity

MCP has the broadest ecosystem. The Anthropic Directory lists hundreds of reviewed connectors. ChatGPT, VS Code, Cursor, and Zed all support MCP. The ecosystem effect is real: if you need to connect to an obscure SaaS tool, MCP probably has a connector and a CLI probably doesn’t.

Skills are Claude Code-only. CLIs are everywhere, but require the LLM to know the correct flags and syntax. This matters less than it used to — modern LLMs handle CLI syntax well.

Decision Matrix: Which Approach for Which Scenario?

Scenario

Best Fit

Why

Local dev / personal DB

CLI + Skills

Zero context cost, composable, easy to debug. Skills embed the psql commands.

Production DB / shared team

MCP

Read-only enforcement at server level. Credential management matters.

Services with mature CLIs (GitHub, AWS, Kubernetes)

CLI

gh, aws, kubectl are battle-tested. LLMs already know their flags.

Services without CLIs (Slack, Notion, Linear)

MCP

No better option exists. But prefer Skills + HTTP API when possible.

Complex multi-step workflows (commit → review → deploy)

Skills

On-demand loading, composable CLI pipeline steps, easy to maintain.

Event-driven reactivity (monitoring alerts, webhook responses)

MCP Channels

MCP’s bidirectional capability (channels) has no CLI equivalent.

Non-developer users

MCP

Terminal-free, abstracted auth, graphical tool selection.

Rapid prototyping, one-off data transformation

CLI

DuckDB, jq, grep — already installed, already known.

Key insight: The teams I’ve seen ship the fastest AI workflows aren’t picking one approach. They’re layering them.

The Synthesis: Three Layers, Not One Protocol

  • Layer 1 — CLI for the basics. The tools engineers already use: gh, aws, kubectl, psql, docker. Zero integration cost. The LLM already knows them. When something breaks, you can reproduce it in the terminal immediately.
  • Layer 2 — Skills for the workflows. Structured instruction files that wrap CLI commands with context-specific guidance. A “Deploy to staging” skill embeds the exact aws and kubectl commands, the environment variables, and the validation steps. It loads on demand and costs negligible context.
  • Layer 3 — MCP for the gaps. Services without CLIs (Slack, Notion, Figma). Production databases where safety guardrails matter. Event-driven channels where the server pushes messages into the session.

This is what the Quandri team actually runs. Their stack: “Bash + CLI for tools we already use day-to-day. Skills for repeatable multi-step workflows. MCP for services without a strong CLI, and where team-wide auth or permission scoping matters.”

The Verdict

MCP isn’t dead. But the one protocol to rule them all vision is. And that’s healthy.

The backlash has been productive because it forced the ecosystem to reckon with real tradeoffs: context is scarce, latency matters, and every extra abstraction layer costs something. The response — Tool Search, deferred loading, server-level tool timeouts, managed configuration — shows Anthropic is listening. But it also shows that MCP is still catching up to problems that CLIs never had.

If you’re choosing today:

  • Start with CLI. If a command-line tool already exists and your LLM can use it, adding MCP is just another abstraction with no benefit. The token savings alone are worth it.
  • Add Skills for repeatable, multi-step workflows. They’re the most efficient way to give an LLM domain-specific guidance without burning context.
  • Use MCP where it uniquely helps. No CLI? Need read-only database enforcement? Want event-driven reactivity? MCP earns its keep there.

The teams that optimize for minimum abstraction and maximum debuggability will ship faster. The protocol wars are a distraction. The real question is how simple you can keep it.

Further Reading

No comments yet

Live feed in your inbox

Track the tools. Lead the shift.

Tech leaders use Artificialus to stay ahead: editorial picks, agent comparisons, MCP updates, and signal-heavy analysis when it matters.

No spam. Only tools and shifts worth tracking.