CLI vs MCP: How AI Agents Actually Decide Which Tool to Call

Jia Chen

21 May 2026 • 10 min read

When you watch a coding agent work, the same kind of task can travel down two very different paths. Ask it to view a file and it might run cat README.md in a shell. Ask it to do "the same thing" through a Model Context Protocol (MCP) server and it will instead emit a structured filesystem.read_file({"path":"README.md"}) JSON-RPC call. Ask it to commit changes and it might shell out to git commit -m "..." — or invoke a git_commit tool from a GitHub MCP server. Ask it to update a CRM record and the same agent will almost always reach for the HubSpot MCP server rather than try to script the HubSpot HTTP API by hand.

To users, this looks like a black box. In reality, the decision is the product of (1) what tools the agent's framework actually exposes to the model at decode time, (2) the natural-language descriptions and schemas attached to those tools, (3) the model's training-data priors, and (4) explicit policies in the system prompt. This article walks through the mechanism scientifically, with concrete git and HubSpot examples, and explains why frameworks like Claude Code, Cursor, OpenAI's Agents SDK, and LangGraph make different default choices.

What "CLI" and "MCP" actually are, from the model's point of view

From the LLM's perspective, both CLI and MCP are just tools — a name, a description, and a JSON schema injected into its context, paired with a runtime that executes whatever the model emits. The difference is the shape of that tool.

A "CLI tool" in modern agent frameworks is almost always a single high-privilege tool. Anthropic's Claude API exposes bash_20250124 and text_editor_20250124 as Anthropic-schema tools that run in the client's environment; the model calls them with a command string and the client executes it (Anthropic, "Tool use with Claude," docs.claude.com). Claude Code's Bash tool is the canonical example: the LLM emits arbitrary shell, and the harness runs it with permission checks. OpenAI's Agents SDK exposes an analogous LocalShellTool. The "tool surface" is tiny — usually one or two entries — and the model relies on its training-data fluency in shell to actually drive the system.

An "MCP tool" is the opposite shape. The Model Context Protocol, introduced by Anthropic in November 2024, is an open JSON-RPC standard that lets an MCP server advertise a discrete catalog of typed tools (and optionally resources and prompts) to any MCP client (modelcontextprotocol.io). When Claude Code, Cursor, VS Code, or ChatGPT connects to, say, github-mcp-server, it calls tools/list, gets back schemas for create_pull_request, get_issue, list_commits, and so on, and registers each one as an independently callable function. Each tool has a name, a docstring, and a strict input schema.

So the first scientific point is simple: CLI and MCP are not competing technologies — they are two ends of a spectrum of tool granularity. CLI = one general-purpose, untyped, high-context-prior tool. MCP = many narrow, typed, low-context-prior tools.

How a model picks a tool at decode time

Modern tool-using LLMs (Claude 3.5+, GPT-4o/5, Gemini 2.x) are trained with a special system prompt that injects all tool definitions into context and teaches the model to emit a tool_use block when it wants to act. Anthropic documents this explicitly: Claude's tool-use system prompt costs 313–346 tokens on top of the tools array itself, and tool definitions are included in every request (Anthropic, "Tool use with Claude").

The model's choice among available tools is not a hard-coded router. It is autoregressive next-token prediction conditioned on:

The user request (e.g. "commit my staged changes with message 'fix typo'").
2. The tool list and descriptions in the system prompt. Anthropic's tool-writing guide is unusually frank about this: "prompt-engineering your tool descriptions and specs… can collectively steer agents toward effective tool-calling behaviors. Even small refinements to tool descriptions can yield dramatic improvements" (Aizawa, "Writing effective tools for AI agents," anthropic.com, 2025).
3. The model's pretraining prior. LLMs have seen orders of magnitude more git commit invocations on GitHub, Stack Overflow, and in tutorials than they have seen mcp__github__create_commit JSON-RPC calls. Descope's analysis of the March 2026 "MCP is dead" debate summarizes this bluntly: "models are fluent in CLI patterns in a way they're not yet fluent in MCP's JSON-RPC schemas" (Ganguly, "MCP vs. CLI: When to Use Them and Why," descope.com).
4. Explicit policy in the system prompt or AGENTS.md / CLAUDE.md (e.g. "Always prefer the hubspot MCP server over curl for CRM operations").

In practice, this means the decision is heavily biased by which tools are loaded and how they are described. If you connect a GitHub MCP server, the agent will tend to use it for GitHub-specific actions because the tool is right there with a clean schema. If you don't, the agent falls back to bash + gh or bash + git, because those patterns dominate its training data.

Concrete example 1: viewing a file (`cat` vs. a filesystem MCP)

Suppose the user says "show me the first 50 lines of server.py."

Without MCP, Claude Code will emit a single tool call to its built-in Read tool (which is itself a structured file-reader, not raw cat) or to Bash with sed -n '1,50p' server.py. The model picks the structured Read tool because Claude Code's system prompt specifically tells it to prefer Read over cat, partly to enforce the 25,000-token output cap that Claude Code applies to tool results (Anthropic, "Connect Claude Code to tools via MCP," code.claude.com).

With a filesystem MCP server (e.g. @modelcontextprotocol/server-filesystem), the same request would route to filesystem.read_text_file({"path":"server.py","head":50}). The schema is stricter, the result is namespaced, and the call can be audited and permissioned per-tool.

Which is "right"? For a single developer working locally, the CLI/built-in path is strictly cheaper: no extra schema in context, no JSON-RPC round trip, and the model is already fluent in shell. For a multi-tenant SaaS agent that must enforce "user A can only read paths under /tenants/A/," the MCP path is the only safe one, because MCP's authorization spec mandates OAuth 2.1 with PKCE and per-tool scopes (modelcontextprotocol.io; Descope, 2026).

Concrete example 2: `git commit` vs. a Git/GitHub MCP

git is the canonical case where CLI usually beats MCP for a solo developer. The command surface is small, the model has seen millions of git add/git commit/git push sequences in training, and the operations are local. A Git MCP server has to re-expose the same verbs through a JSON schema that the model has seen far less often, while also paying the MCP "schema tax" — a typical MCP server with dozens of tools "can consume tens of thousands of tokens before the agent executes a single operation" (Descope, 2026; Smithery, "MCP vs CLI Is the Wrong Fight," 2026).

The Smithery benchmark, a 756-run study across three models and three APIs, found that "CLI agents consistently used fewer tokens per simple operation than their MCP counterparts" for well-known local tooling — exactly the regime git lives in. But the same benchmark found that for remote and unfamiliar APIs, "native MCP tool integration consistently gave agents the best chance of completing the task" (summarized in Descope, 2026).

The decision flips the moment the request crosses a network boundary or touches GitHub-the-service rather than git-the-binary. "Open a PR from feature/x to main and request review from AI's Memento Problem: Why LLMs Can't Form New Memories

Concrete example 3: HubSpot — why almost nobody uses CLI here

HubSpot has no first-party CLI for CRM record manipulation. To update a contact from a shell, an agent would have to write something like:

  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"properties":{"lifecyclestage":"customer"}}'

Every part of that is fragile for an LLM: it has to remember the exact endpoint, hand-build JSON, manage the token in an environment variable, and parse a free-form response. Anthropic's own engineering team observed this pattern: "a common error we've observed is tools that merely wrap existing software functionality or API endpoints — whether or not the tools are appropriate for agents… LLM agents have limited context… the better and more natural approach is to skip to the relevant page first" (Aizawa, anthropic.com, 2025).

The HubSpot MCP server (@hubspot/mcp-server) collapses all of this into typed tools like crm-update-object, crm-search-objects, crm-create-engagement. The schema makes the valid properties discoverable, auth is handled once via OAuth, and the response is structured. This is the regime where MCP dominates by design — third-party SaaS with rich object models, per-user auth, and no meaningful local CLI.

The general principle that emerges across these three examples: CLI wins when the model already knows the verbs and the data is local; MCP wins when the verbs are service-specific, the auth must be per-user, and the response shape needs to be machine-readable.

Why frameworks behave differently

Different agent frameworks expose tools differently, which changes the model's revealed preference even when the underlying LLM is identical.

Claude Code ships with a strong built-in toolset (Read, Write, Edit, Bash, Grep, Glob, WebFetch) and adds MCP servers on top. As of recent versions it also defers MCP tool definitions until needed via "Tool Search" — by default only tool names load at session start, and full schemas are fetched on demand (code.claude.com). The practical effect is that Claude Code biases toward its built-ins for local file/git work and toward MCP for anything cross-service.

Cursor and VS Code expose MCP through a similar client-side registry but rely more heavily on their own editor APIs for file editing; their shell tool is typically used as the CLI fallback.

OpenAI's Agents SDK and Responses API support both function calling (the original 2023 schema-based pattern, conceptually identical to a single MCP server) and a local_shell tool. OpenAI added first-class MCP support in 2025 via the Responses API's mcp tool type, but historically the SDK's docs emphasize narrow function tools, which biases developers toward an MCP-shaped pattern even without using MCP.

LangGraph and LangChain treat every tool — CLI wrapper, MCP tool, or Python function — as a Tool object with a name, description, and Pydantic schema. The framework itself is agnostic; the developer's bind_tools(...) call determines what the model sees. This is why two LangGraph agents pointed at the same LLM can make opposite CLI-vs-MCP choices.

Perplexity publicly stated in March 2026 that it was moving away from MCP internally toward "traditional APIs and CLIs" to reduce context overhead and auth friction at scale — though it still maintains a public MCP server (Ganguly, descope.com, 2026). This is a useful counter-data-point: at extreme scale, the MCP schema tax is real enough to justify a custom path.

The hidden cost: tokens and the schema tax

The single most empirically grounded argument against defaulting to MCP everywhere is context cost. Anthropic's own engineering blog quantifies it: a naive MCP setup with many servers can push 150,000 tokens of tool definitions into context before the agent does anything; the "code execution with MCP" pattern, where the agent writes TypeScript against a generated tool tree and only imports what it needs, drops that to about 2,000 tokens — a 98.7% reduction (Jones & Kelly, "Code execution with MCP," anthropic.com, 2025). Cloudflare reports a similar result with its "Code Mode" pattern, which collapses thousands of endpoints behind a search() + execute() pair.

CLI sidesteps this entirely: bash is one tool with one short description, and the model fills in the rest from its prior. That is exactly why Smithery's benchmark found CLI cheaper for simple, well-known operations — and why the same benchmark found CLI more expensive for unfamiliar APIs, because the model burns tokens browsing --help output, parsing free-form text, and retrying malformed arguments.

This is the scientifically defensible takeaway: the CLI-vs-MCP token tradeoff is U-shaped in task familiarity. For verbs the model knows cold (git, cat, grep, ffmpeg), CLI is cheaper. For verbs it has never seen with a structured schema (your internal billing API, an obscure SaaS), CLI is more expensive because of trial-and-error overhead. MCP flattens that curve at the cost of upfront schema tokens — a cost that progressive disclosure / code execution / tool search now largely mitigate.

A practical decision framework

Pulling the literature together, a defensible rubric is:

Use CLI (or the framework's built-in equivalent) when the tool is local, the verb is in the model's training data, you trust the host's credentials, and you don't need per-user auth or tenant isolation. Classic examples: git, cat/Read, grep/Grep, docker, ffmpeg, kubectl against a personal cluster.

Use MCP when the tool is a remote SaaS or internal service, the API is large or unfamiliar to the model, you need OAuth 2.1 / PKCE per-user scoping, you need structured audit trails, or you want a tool catalog that can be discovered and reused across agents. Classic examples: HubSpot, Salesforce, Notion, Linear, Sentry, internal company APIs, multi-tenant production systems.

Use both, with an explicit preference encoded in CLAUDE.md / AGENTS.md / your system prompt, when you're building anything serious. Anthropic's, Cloudflare's, and Smithery's research all converge on a hybrid: CLI for local Unix-philosophy composition, MCP for remote services, and code execution as the bridge that prevents either layer from polluting the context window.

Closing: the decision is not really the model's

The framing "the agent decided to use MCP" is misleading. The agent emits the highest-probability tool call given the tools you exposed, the descriptions you wrote, the priors baked into its weights, and the instructions in its system prompt. Every one of those is a design choice made by the framework author and the developer — not by the model.

If you want predictable behavior, the levers are concrete: curate the tool set, write tool descriptions like you're onboarding a new hire (Aizawa, 2025), namespace related tools, prefer progressive disclosure over loading every schema upfront, and state your preferences explicitly in the agent's persistent instructions. Do that and the "black box" mostly opens up.s

References

Anthropic. "Tool use with Claude." Claude API documentation. https://docs.claude.com/en/docs/build-with-claude/tool-use/overview
Anthropic. "Connect Claude Code to tools via MCP." Claude Code documentation. https://docs.claude.com/en/docs/claude-code/mcp
Aizawa, K. "Writing effective tools for AI agents — using AI agents." Anthropic Engineering, 2025. https://www.anthropic.com/engineering/writing-tools-for-agents
Jones, A., & Kelly, C. "Code execution with MCP: building more efficient AI agents." Anthropic Engineering, 2025. https://www.anthropic.com/engineering/code-execution-with-mcp
Model Context Protocol. "What is the Model Context Protocol (MCP)?" https://modelcontextprotocol.io/docs/getting-started/intro
Ganguly, R. "MCP vs. CLI: When to Use Them and Why." Descope Blog, March 19, 2026. https://www.descope.com/blog/post/mcp-vs-cli
Smithery. "MCP vs CLI Is the Wrong Fight" (756-run benchmark across three models and three APIs), 2026. Repository: github.com/smithery-ai/mcp-vs-cli-bench
Cloudflare. "Code Mode" pattern for MCP context reduction, referenced in (4) and (6).
HubSpot. Official MCP server: @hubspot/mcp-server on npm and the HubSpot Developers MCP docs.
GitHub. github-mcp-server reference implementation.