Why Your AI Agent Is Drowning in Tools (And How Code Mode Saves It)
When an AI agent integrates many MCP tools, it risks context bloat and tool hallucination — 50+ tools can eat 5–7% of the context window. Traditional remedies like agent-side filtering and MCP-side reduction have trade-offs. Code mode lets the LLM search and execute tools via code, slashing token usage, enabling complex control flow, but adding debugging and infrastructure overhead. Cloudflare and Anthropic examples show that the real lesson is to keep a reasonable toolset driven by use cases, not magic numbers.
Imagine you use various MCP servers for work. As a developer, you might connect a Figma MCP, Context7 MCP, or Jira MCP server to your agent, allowing you to leverage tools using natural language. Sounds perfect, right?
But you've probably already hit the wall: too many tools flooding your LLM's context window.
想象一下,你在工作中使用各种 MCP 服务器。作为开发者,你可能会将 Figma MCP、Context7 MCP 或 Jira MCP 服务器连接到你的 AI 代理,从而能够用自然语言来调用各种工具。听起来很完美,对吧?
但你很可能已经撞墙了:太多工具淹没了 LLM 的上下文窗口。
This creates two critical problems. First, context window bloat. Every tool name, description, and parameter schema consumes tokens on every request. At 50+ tools, this can eat 5 to 7 percent of the model's context before a single user message arrives, crowding out conversation history, document content, and reasoning space.
Second, tool hallucination. When you have too many semantically similar tools, the LLM starts inventing tool names that don't exist, conflating parameters between tools, or calling the right tool with arguments from a different tool's schema. For a deeper dive into this issue, check out this great article MCP Tool Design.
这引发了两个关键问题。第一,上下文窗口膨胀。每个工具的名称、描述和参数 schema 在每次请求时都会消耗 token。当工具数量超过 50 个时,在用户消息到达之前就可能吃掉 5% 到 7% 的模型上下文,挤占对话历史、文档内容和推理空间。
第二,工具幻觉。当语义相似的工具过多时,LLM 会开始编造不存在的工具名称,混淆不同工具之间的参数,或者调用正确的工具但传入另一个工具 schema 的参数。要深入了解这个问题,可以参考一篇好文章《MCP 工具设计》。
So how do we fix this?
The most straightforward approach is reducing the number of tools. You can do this in two ways: limit what the AI agent sees, or decrease the number of tools on the MCP side.
那么如何解决这个问题?
最直接的方法是减少工具数量。你可以通过两种方式实现:限制 AI 代理所能看到的工具,或者在 MCP 侧减少工具本身的数量。
With this approach, you define a curated set of tools that solve the specific problem your AI agent is facing. You request the full tool listing from the MCP, then filter it before passing it to the LLM. Fewer tools mean less context consumed.
Benefits:
Very transparent: every call and argument is visible and easy to debug
Simple to implement: follow the MCP spec and you're done
Drawbacks:
Bloat: Large APIs (hundreds/thousands of endpoints) still create tool bloat
Round-trips: Multi-step workflows require many round-trips
Complex control flow: Loops, retries, and branching are clumsy when expressed as individual tool invocations
Ongoing evaluation: You need to reevaluate new tools to see if they fit your scenario
这种方案中,你需要为 AI 代理当前面对的具体问题精选一套工具。先从 MCP 获取完整的工具列表,再将其过滤后才传给 LLM。工具越少,消耗的上下文就越少。
优点:
高度透明:每次调用及其参数都可见,易于调试
实现简单:照 MCP 规范做即可
缺点:
膨胀:面对大型 API(数百/数千个端点)仍会造成工具膨胀
往返次数多:多步骤工作流需要大量往返通信
控制流复杂:用单个工具调用来表达循环、重试和分支会非常笨拙
持续评估负担:需要不断评估新工具是否适合你的场景
If you own the MCP server, you can reduce tools at the source. Don't just wrap every REST API call into a tool. Think about the use cases you're trying to solve (again, see MCP Tool Design). One use case might combine one or two APIs together, or wrap a single API call with business logic.
This is an iterative process. Our team went through the same tedious process, with numerous iterations to clean up our tools. Once you have clear use cases, ensure each tool addresses a distinct problem to avoid the hallucination issue.
If you still end up with too many tools (something our team is slowly facing), consider configuration options. You might add a persistence layer where users can pick and choose tools, or provide configuration options to hide certain tools when running the MCP locally. Each approach has trade-offs.
如果你拥有这个 MCP 服务器,就可以从源头削减工具。不要简单地把每个 REST API 调用都封装成一个工具。想清楚你在解决什么用例(再次推荐阅读《MCP 工具设计》)。一个用例可能会组合一两个 API,或者将单次 API 调用与业务逻辑封装在一起。
这是个迭代过程。我们团队也经历过同样繁琐的过程,经过无数次迭代才把工具清理干净。当你有了清晰的用例后,确保每个工具都解决一个明确的问题,这样才能避免幻觉问题。
如果最终工具仍然太多(这是我们团队正在慢慢面对的问题),可以考虑配置选项。你可以添加一个持久化层,让用户自行挑选工具;也可以提供配置选项,在本地运行 MCP 时隐藏某些工具。每种方案都有取舍。
Here's where things get interesting. Instead of exposing tools directly to the LLM, you make them searchable and executable through code. This might sound similar to tool reduction, but the mechanism is fundamentally different.
The key insight: LLMs, trained on massive code datasets, are supposed to be better at writing code than calling tools directly. This approach was pioneered by Cloudflare in their Code Mode article, with impressive results:
Can handle more (and more complex) tools
Dramatically reduces context window usage when multiple calls are needed
这里才是精彩之处。与其把工具直接暴露给 LLM,不如让它们通过代码来搜索和执行。这听起来可能类似工具削减,但机制完全不同。
核心洞见在于:LLM 以海量代码数据训练而成,理论上写代码比直接调用工具更擅长。这种方法最早由 Cloudflare 在其代码模式文章中提出,成果显著:
能够处理更多(且更复杂)的工具
需要多次调用时,大幅减少上下文窗口的占用
Let's examine the trade-offs.
Benefits:
Massive token reduction: Only a small SDK + a few examples are in context instead of the whole API schema. Multi-step workflows become one generated program + one execution, instead of many tool calls.
Better control flow: Loops, conditionals, retries, and batching are just normal code.
Fewer LLM round-trips: One execution can encapsulate dozens of real API calls.
Stronger isolation: Code runs in a well-scoped sandbox with tightly controlled outbound access.
Drawbacks:
Harder to debug: You inspect generated code and its logs, not a clean sequence of tool calls.
Requires infrastructure: You need a code execution environment and sandboxing.
Less "pure MCP": You're layering a mini runtime and SDK on top.
我们来看看利弊权衡。
优点:
大幅减少 token 消耗:上下文中只需一个小型 SDK 加上几个示例,而不是整个 API schema。多步骤工作流变成“一个生成的程序 + 一次执行”,而非多次工具调用。
更好的控制流:循环、条件、重试和批量操作不过是普通代码。
更少的 LLM 往返:一次执行可以封装数十次实际 API 调用。
更强的隔离性:代码运行在拥有严格出站访问控制的沙箱中。
缺点:
更难调试:你面对的是生成的代码及其日志,而非清晰的工具调用序列。
需要基础设施:你需要代码执行环境和沙箱。
不那么“纯 MCP”:你在上面加了一层迷你运行时和 SDK。
This approach can be applied on either the agent side or the MCP side. Cloudflare's implementation works on the agent side. When you connect to an MCP server in "code mode," the Agents SDK fetches the MCP server's schema and converts it into a TypeScript API with doc comments. It then exposes two tools to the LLM:
search: Allows the model to search over the pre-resolved OpenAPI spec using a JavaScript async arrow function. This returns only the relevant endpoints, types, and examples instead of stuffing the entire spec into context.
execute: Allows the model to run a JavaScript async arrow function in a sandboxed Dynamic Worker isolate, where it can call endpoints, handle pagination, add conditionals/loops, and compose multi-step workflows.
Anthropic introduced a similar approach in Code execution with MCP, using a tree structure to search for viable tools. Same concept, different search implementation.
这种方法既可以应用在代理侧,也可以应用在 MCP 侧。Cloudflare 的实现工作在代理侧。当你以“代码模式”连接到一个 MCP 服务器时,Agents SDK 会获取该 MCP 服务器的 schema,并将其转换为带有文档注释的 TypeScript API。然后,它向 LLM 暴露了两个工具:
search:允许模型使用 JavaScript 异步箭头函数来搜索预解析的 OpenAPI 规范。这只会返回相关的端点、类型和示例,而不是把整个规范塞进上下文。
execute:允许模型在一个沙箱化的 Dynamic Worker 隔离体中执行 JavaScript 异步箭头函数,其中可以调用端点、处理分页、添加条件/循环,并编排多步骤工作流。
Anthropic 在其 MCP 代码执行一文中也引入了类似方法,使用树形结构来搜索可用工具。概念相同,搜索实现不同。
While both patterns I've described live on the agent side, you're not limited to that approach. You can implement the same functionality on the MCP side by:
Adding a search tool alongside your other tools
Making all tools searchable (exposing just one search tool, or all tools plus search)
Adding an execution tool for the passed code
The execution tool can live on either side. Search implementations vary—for example, with Anthropic's approach, the search tool traverses a file structure and returns matches. But that's just one option among many.
虽然我描述的两种模式都工作在代理侧,但你并不局限于此。你也可以在 MCP 侧实现相同的功能,方法包括:
在其他工具旁边添加一个搜索工具
使所有工具可搜索(只暴露一个搜索工具,或暴露所有工具外加一个搜索工具)
为传入的代码添加一个执行工具
执行工具可以放在任一侧。搜索实现方式各异——例如,Anthropic 的方案中,搜索工具遍历文件结构并返回匹配结果。但这只是众多选项之一。
I've seen it repeatedly: new approaches like code mode initially seem to dismiss MCP entirely. But look closer, and you'll see MCP still plays a crucial role thanks to its operability, scalability, and security features. I would also add to take the token reduction claims with a grain of salt. Indeed, you can save tokens with code mode, but only when you have repetitive operations or when you care only about part of the data structure, which might not always be the case.
我反复看到这样的现象:代码模式这类新方法最初看起来似乎完全抛弃了 MCP。但仔细看,你会发现 MCP 凭借其可操作性、可扩展性和安全特性,仍然扮演着关键角色。我还想补充一点:对 token 减少的宣称要保持审慎。确实,代码模式可以节省 token,但只在有重复操作或你只关心部分数据结构时才成立,而这未必是常态。
The key takeaway? Regardless of which approach you choose, you still need to maintain a reasonable number of tools in your MCP server. What "reasonable" means depends entirely on your use cases, there's no magic number like 5, 20, or 50.
Start with your use cases, iterate on your tool design, and choose the pattern that best fits your constraints. Your AI agent (and your context window) will thank you.
Published by...
核心要点是什么?无论选择哪种方式,你仍然需要在 MCP 服务器中维护合理数量的工具。至于“合理”意味着什么,完全取决于你的用例,并没有什么 5 个、20 个或 50 个之类的神奇数字。
从你的用例出发,迭代工具设计,然后选择最适合你约束条件的模式。你的 AI 代理(以及你的上下文窗口)会感谢你的。
发布于……