Code Mode and MCP: Let Agents Write Code, Not JSON

Tool calling is one of those ideas that sounds obvious until you scale it: give a model a list of tools, let it emit a little structured JSON, run the tool, feed the result back, repeat. It works, until you connect real systems, with dozens of tools, large payloads, and workflows that require chaining calls.

Two independent teams arrived at the same conclusion in 2025: models are better at writing code than they are at speaking a synthetic tool-call language.

Cloudflare calls this approach “Code Mode”: turn MCP tool schemas into a TypeScript API and have the model write TypeScript that calls it.
https://blog.cloudflare.com/code-mode/
Anthropic describes the same core pattern as “code execution with MCP”: present tools as code APIs, execute the code, and only pass back what the model needs.
https://www.anthropic.com/engineering/code-execution-with-mcp

This post breaks down what’s going on, why it works, and how to apply it when you’re building agentic systems.

What MCP Actually Solves

MCP (Model Context Protocol) is fundamentally about standardizing how agents discover and use external capabilities:

A uniform protocol for tool definitions and documentation
A uniform way to connect and authorize
A universal “plug” between agents and services

Fundamentally, MCP reduces integration cost. You implement MCP once and gain an ecosystem of tools.

Where things get messy is the typical usage pattern: expose every tool directly to the model and ask it to emit tool-call JSON. That puts the model in a role it’s not naturally optimized for.

Why Direct Tool Calling Breaks Down

Direct tool calling stresses two scarce resources: context and attention.

1) Tool definitions bloat the context window

Many MCP clients preload tool definitions so the model can “see” everything. At scale, this is expensive and slow. You’re spending tokens before the model has even read the user’s request.

2) Intermediate results have to pass through the model

The classic loop looks like:

Model calls tool A
Tool A returns a big payload
Payload is stuffed into the model context
Model calls tool B, copying pieces from A’s result

If the payload is large (documents, logs, tables), you waste tokens, increase latency, and increase error rate. You’re forcing the model to act like a lossy buffer between two systems.

The Code Mode Shift

Code Mode flips the default:

Instead of saying, “model, call tools,” you say, “model, write a small program that uses a typed API; we will execute it safely.”

This is powerful for one reason: code is the model’s native medium. Modern models have seen huge amounts of TypeScript/JavaScript in the wild. A synthetic tool-call schema is comparatively niche.

Cloudflare’s twist is to generate TypeScript interfaces from MCP schemas and execute the model-authored code inside a sandboxed environment, with authorization handled outside the sandbox so keys don’t leak.
https://blog.cloudflare.com/code-mode/

Anthropic describes the same win from a systems angle: load fewer tool definitions and avoid pushing large intermediate results through the model; compute in the execution environment and pass back only what’s needed.
https://www.anthropic.com/engineering/code-execution-with-mcp

A Simple Mental Model

If you want to reason about Code Mode, think in three layers:

The model writes code (planning + structure).
The sandbox runs code (loops, transforms, aggregation).
The MCP client handles connectivity and auth (discovery, tokens, access control).

That separation is why the pattern is both cheaper and safer.

Why It’s Token-Efficient

The main savings comes from reducing “round trips through the model.”

With direct tool calls, every step is:

model emits a call
tool returns a result
result is injected into context
model reads it and emits the next call

With Code Mode, the model can write a loop once, and the loop can run without re-serializing all intermediate state into the model context. The model only needs the final output or a small summary.

Cloudflare reports large token reductions on chained tasks when the model is allowed to express the workflow as code rather than a series of discrete tool calls.
https://blog.cloudflare.com/code-mode/

Anthropic reports similar gains when switching from preloading tool definitions and piping results through the model to a code-execution approach.
https://www.anthropic.com/engineering/code-execution-with-mcp

Why It’s Safer

Safety is not a vibe. It’s architecture.

Code Mode makes it easier to enforce “least privilege”:

The sandbox can be isolated (no arbitrary network access).
The available APIs can be whitelisted (only the MCP-backed interfaces).
Secrets can live outside the sandbox (the execution harness injects auth).

Cloudflare explicitly highlights that the agent code can’t leak tokens if it never has access to them in the first place.
https://blog.cloudflare.com/code-mode/

What This Means for OutcomeDev

OutcomeDev’s core belief is that the right unit of work is not “a completion.” It’s a verified change.

Code Mode aligns with that goal because it turns agentic work into something we can:

constrain with interfaces
sandbox with hard boundaries
verify with tests and commands

In practice, the strongest pattern is hybrid:

Use direct tool calling for simple, single-shot queries.
Switch to Code Mode when you need loops, branching logic, or many tool calls.

That’s the difference between “an AI that can call tools” and “an AI that can run workflows.”

Sources

Cloudflare: Code Mode: the better way to use MCP: https://blog.cloudflare.com/code-mode/
Anthropic: Code execution with MCP: https://www.anthropic.com/engineering/code-execution-with-mcp