Type to search across all content
    Engineering

    Abstraction Tax: Measuring What MCP Costs Against CLI and Skills

    When MCP launched in late 2024, Anthropic described it as a USB-C port for AI applications. The industry responded by racing to ship MCP servers — Linear, Notion, Slack, GitHub, Postgres, and hundreds more. Every SaaS landing page added MCP supported to its feature list.

    A year and a half later, the pendulum has swung hard in the other direction. Engineers are publishing benchmarks showing MCP consuming 10%+ of context windows, adding 3-9x latency, and duplicating functionality that existing CLIs already handle better. MCP is dead headlines are multiplying.

    Neither the breathless hype nor the angry backlash tells the full story. From the measurements the community has published, I see a layered integration model where CLIs, Skills, and MCP each solve distinct problems — and the teams shipping the fastest AI workflows are using all three, not betting on one.

    Why This Debate Matters Now

    The stakes aren’t academic. The way you connect your LLM to tools determines:

    • How much of your context window is available for actual work
    • How quickly your agent can act on a request
    • How easy it is to debug when something goes wrong
    • How composable your toolchain is across human and AI workflows

    In 2024, the default answer was MCP server. In mid-2026, that answer is looking naive. The protocol hasn’t stood still — Anthropic’s Tool Search with deferred loading cut context overhead by 85%+ according to published measurements — but the fundamental architectural questions remain.

    The Three Approaches

    The debate has three real contenders, not two.

    CLI / Bash

    The oldest and most proven. LLMs already understand CLI tools — they’ve trained on millions of man pages, Stack Overflow answers, and shell scripts. When an agent runs gh pr view 123 or jq '.items[] | select(.status == "open")', it’s using the same interface a human would. No new protocol. No background process. No context overhead.

    MCP (Model Context Protocol)

    A standardized JSON-RPC protocol for connecting LLMs to external tools. MCP servers define tools with names, descriptions, and parameter schemas. The LLM calls these tools through a client (Claude Code, ChatGPT, VS Code). The promise: one integration pattern for everything. The cost: tool definitions consume context, every call goes through an extra process hop, and the server infrastructure has moving parts.

    Skills (Claude Code’s Skill System)

    Anthropic’s less-hyped alternative. Skills are structured instruction files loaded on-demand into context. They embed CLI commands, API calls, and workflow instructions that the LLM executes directly. Unlike MCP, Skills don’t load tool definitions upfront — they only occupy context when invoked. A Skill for Linear might contain curl commands and jq parsing instructions, giving the LLM everything it needs at negligible context cost compared to pre-loading full MCP tool definitions.

    The Numbers: What Each Approach Costs

    Context Consumption

    The backlash started with numbers. Quandri Engineering’s May 2026 measurements are the most concrete I’ve seen:

    Tool definitions loaded with 4 MCP servers:

    MCP Server

    Tools

    Approx. Tokens

    Linear

    42

    ~12,807

    Notion

    14

    ~4,039

    Slack

    12

    ~3,792

    Postgres

    9

    ~438

    Total

    77

    ~21,077

    That’s 10.5% of Claude’s 200K context window and 16.5% of GPT-4o’s 128K window — consumed before a single tool call is made. Linear alone accounts for 12,807 tokens of tool schemas, even if you only ever use two tools from it.

    The same operation via CLI vs MCP:

    To look up a Linear issue:

    Approach

    Token Cost

    CLI (curl + GraphQL query)

    ~200 tokens total

    MCP (tool definitions + call)

    ~12,957 tokens total

    That’s ~65x more tokens for the MCP path. The overhead comes from pre-loading 42 tool definitions you don’t need.

    Caveat: Anthropic’s Tool Search (deferred loading) changes this picture. With ENABLE_TOOL_SEARCH=true (default in current Claude Code), tool schemas load on-demand rather than upfront. Published measurements show this reduces context usage by 85%+. The “10% of context” problem is largely solved for users on current versions.

    But deferred loading introduces a new cost: the search step. Claude must now discover which tools exist before calling them, adding a round-trip. A tradeoff — and a reminder the protocol is still maturing.

    Latency

    The Quandri Engineering benchmarks, building on ejholmes’ analysis , showed Jira MCP being 3x slower per call than direct API access, and 9.4x slower on first call due to server initialization.

    MCP adds an extra process between the LLM and the target API. Every tool call goes through:

    1. LLM → Client → MCP Server → API → MCP Server → Client → LLM

    Instead of the CLI path:

    1. LLM → CLI binary → stdout → LLM

    The extra hop matters less for long-running operations (database queries, file writes) and more for rapid, iterative tool use — which is exactly what agents do most.

    Reliability

    MCP servers are processes. They fail to start, they crash mid-session, they require re-authentication. From the Quandri post: Init failure, repeated re-auth and mid-session tool death are cited as daily friction.

    Claude Code documentation confirms this: stdio servers don’t auto-reconnect after failure. HTTP/SSE servers get five retry attempts with exponential backoff, but after that they’re marked as failed. For a developer in the middle of a complex task, an MCP server failure means restarting the session.

    CLIs have no such moving parts. They’re binaries on disk. They execute when called and return when done. No background state to manage.

    The Comparison Dimensions

    Performance

    Dimension

    CLI

    Skills

    MCP

    Context overhead

    0 tokens

    Loaded on demand (minimal token overhead)

    ~21K tokens for 4 servers (with Tool Search: ~85% less)

    Call latency

    Direct execution

    Direct execution

    3-9x overhead on first call

    Composability

    Pipes, jq, grep, redirect

    CLI-based, inherits pipeability

    JSON-RPC responses, server-constrained

    Reliability

    Binary on disk, no moving parts

    None (in-context instructions)

    Server process dependency

    Developer Experience

    Dimension

    CLI

    Skills

    MCP

    Debugging

    Run same command in terminal

    Run same command in terminal

    Inspect JSON transport logs

    Auth

    Existing flows (gh auth, aws sso, kubeconfig)

    Inherits CLI auth

    Per-server auth, OAuth flow, re-auth cycles

    Permissions

    Granular: allowlist specific commands

    Granular: allowlist specific skill content

    All-or-nothing: allowlist by tool name only

    Setup cost

    Already installed (gh, aws, psql)

    Add skill file to .claude/skills/

    Install server, configure auth, manage process lifecycle

    Safety & Security

    Dimension

    CLI

    Skills

    MCP

    Query validation

    None (LLM writes raw SQL)

    None (LLM writes raw SQL)

    Server-level read-only enforcement possible

    Credential exposure

    In prompt or env vars

    In prompt or env vars

    Managed server-side

    Attack surface

    Existing tool surface

    None (no new processes)

    Server process per integration, prompt injection risk

    This is the one dimension where MCP has a genuine advantage for production deployments. An MCP Postgres server can enforce read-only mode, block DROP TABLE, and keep credentials out of the prompt. A CLI approach cannot — the LLM either has full psql access or none.

    Ecosystem Maturity

    MCP has the broadest ecosystem. The Anthropic Directory lists hundreds of reviewed connectors. ChatGPT, VS Code, Cursor, and Zed all support MCP. The ecosystem effect is real: if you need to connect to an obscure SaaS tool, MCP probably has a connector and a CLI probably doesn’t.

    Skills are Claude Code-only. CLIs are everywhere, but require the LLM to know the correct flags and syntax. This matters less than it used to — modern LLMs handle CLI syntax well.

    Decision Matrix: Which Approach for Which Scenario?

    Scenario

    Best Fit

    Why

    Local dev / personal DB

    CLI + Skills

    Zero context cost, composable, easy to debug. Skills embed the psql commands.

    Production DB / shared team

    MCP

    Read-only enforcement at server level. Credential management matters.

    Services with mature CLIs (GitHub, AWS, Kubernetes)

    CLI

    gh, aws, kubectl are battle-tested. LLMs already know their flags.

    Services without CLIs (Slack, Notion, Linear)

    MCP

    No better option exists. But prefer Skills + HTTP API when possible.

    Complex multi-step workflows (commit → review → deploy)

    Skills

    On-demand loading, composable CLI pipeline steps, easy to maintain.

    Event-driven reactivity (monitoring alerts, webhook responses)

    MCP Channels

    MCP’s bidirectional capability (channels) has no CLI equivalent.

    Non-developer users

    MCP

    Terminal-free, abstracted auth, graphical tool selection.

    Rapid prototyping, one-off data transformation

    CLI

    DuckDB, jq, grep — already installed, already known.

    Key insight: The teams I’ve seen ship the fastest AI workflows aren’t picking one approach. They’re layering them.

    The Synthesis: Three Layers, Not One Protocol

    • Layer 1 — CLI for the basics. The tools engineers already use: gh, aws, kubectl, psql, docker. Zero integration cost. The LLM already knows them. When something breaks, you can reproduce it in the terminal immediately.
    • Layer 2 — Skills for the workflows. Structured instruction files that wrap CLI commands with context-specific guidance. A “Deploy to staging” skill embeds the exact aws and kubectl commands, the environment variables, and the validation steps. It loads on demand and costs negligible context.
    • Layer 3 — MCP for the gaps. Services without CLIs (Slack, Notion, Figma). Production databases where safety guardrails matter. Event-driven channels where the server pushes messages into the session.

    This is what the Quandri team actually runs. Their stack: “Bash + CLI for tools we already use day-to-day. Skills for repeatable multi-step workflows. MCP for services without a strong CLI, and where team-wide auth or permission scoping matters.”

    The Verdict

    MCP isn’t dead. But the one protocol to rule them all vision is. And that’s healthy.

    The backlash has been productive because it forced the ecosystem to reckon with real tradeoffs: context is scarce, latency matters, and every extra abstraction layer costs something. The response — Tool Search, deferred loading, server-level tool timeouts, managed configuration — shows Anthropic is listening. But it also shows that MCP is still catching up to problems that CLIs never had.

    If you’re choosing today:

    • Start with CLI. If a command-line tool already exists and your LLM can use it, adding MCP is just another abstraction with no benefit. The token savings alone are worth it.
    • Add Skills for repeatable, multi-step workflows. They’re the most efficient way to give an LLM domain-specific guidance without burning context.
    • Use MCP where it uniquely helps. No CLI? Need read-only database enforcement? Want event-driven reactivity? MCP earns its keep there.

    The teams that optimize for minimum abstraction and maximum debuggability will ship faster. The protocol wars are a distraction. The real question is how simple you can keep it.

    Further Reading

    No comments yet

    Live feed in your inbox

    Track the tools. Lead the shift.

    Tech leaders use Artificialus to stay ahead: editorial picks, agent comparisons, MCP updates, and signal-heavy analysis when it matters.

    No spam. Only tools and shifts worth tracking.