循环工程:当提示不再是主角,Agent 系统的核心转向
本文由 Claude Code 构建者 Boris Cherny 的观点切入,提出 Agent 开发的重心已从提示工程转向循环工程(Loop Engineering)。作者详细拆解了 Agent 循环的内核(一个简短的 while 循环),并指出真正的工程挑战集中在四个环节:如何准确判定任务完成(而非模型停用工具)、如何保持上下文清洁以防止“上下文腐烂”、如何设计让 Agent 能实际使用的工具(幂等性与面向 LLM 的错误信息)、以及如何在循环中引入独立的验证者(Critic)来避免模型自我认可。文章强调,模型正趋于同质化,围绕模型的“马具”(Harness)——即循环系统——才是工程师应投入精力的方向。适合 Agent 开发、AI 工程与系统设计的相关工程人员阅读。
Half your feed is suddenly saying the same thing. Stop prompting your agents, start engineering loops.
Boris Cherny, the person who built Claude Code, said it plainly: "I don't prompt Claude anymore. I have loops that are running. My job is to write loops."
The person who builds one of the most popular coding agents on earth doesn't prompt it. So what is he doing instead?
That's the whole idea behind loop engineering. Now let's break down why it's harder than it looks.
你信息流里一半的人突然都在说同一件事:别再为 agent 写提示词了,开始设计循环吧。
Claude Code 的缔造者 Boris Cherny 直言:“我不再给 Claude 写提示了。我的循环一直在跑。我的工作就是写循环。”
一个全球最热门编程 agent 的创造者都不写提示了,那他在做什么?
这就是循环工程(loop engineering)背后的全部理念。下面我们来拆解,为什么它比看上去要难得多。
An agent isn't a magic box. At its core, it's a plain loop:
while True:
response = model(context)
if response.has_tool_calls():
results = run_tools(response.tool_calls)
context += results
else:
break
The model reads the context. It asks to call a tool. You run the tool and feed the result back. The model reads again, and this repeats until it stops asking for tools.
Model → tools → context → repeat.
Here's the part that surprises people. This loop is already solved. Every serious agent framework lands on roughly these six lines. Nobody is competing on the while statement.
So if the loop is trivial, what is everyone actually engineering?
agent 不是魔法黑箱。它的核心就是一个简单的循环:
while True:
response = model(context)
if response.has_tool_calls():
results = run_tools(response.tool_calls)
context += results
else:
break
模型读取上下文。它请求调用某个工具。你执行工具并将结果反馈回去。模型再次读取,如此重复,直到它不再请求调用工具。
模型 → 工具 → 上下文 → 重复。
接下来这一点可能会让你意外:这个循环其实已经解决了。每个正经的 agent 框架最终都会落到类似这样六行代码上。没有人在 while 语句上竞争。
既然循环本身很简单,那大家到底在工程化什么?
The center of gravity in AI keeps drifting away from the model itself.
- Prompt engineering. The words you send.
- Context engineering. Everything the model sees, not just your instructions.
- Harness engineering. The code around the model that runs tools, tracks state, and handles errors.
- Loop engineering. The autonomous cycle that drives the whole thing toward a goal. Each layer wraps the one before it. You didn't stop caring about prompts. You just realized the prompt is one small piece of a much bigger system.
LangChain puts it cleanly. Agent = Model + Harness. If you're not the model, you're the harness.
And here's the finding that should reorder your priorities. The harness now matters more than the model. Teams have kept the model fixed, changed only the code around it, and jumped from the middle of a benchmark into the top five. Same brain, different loop.
Loop engineering is the discipline of building everything that brain runs inside. Let me show you the parts that actually break.
AI 的重心一直在从模型本身向外漂移。
- 提示词工程(Prompt engineering):你发送的文字。
- 上下文工程(Context engineering):模型能看到的一切,不仅仅是你的指令。
- 框架工程(Harness engineering):围绕模型运行的代码,负责执行工具、追踪状态、处理错误。
- 循环工程(Loop engineering):驱动整个系统朝向目标的自主循环。 每一层都包裹着前一层。你并非不再关心提示词,只是意识到提示词只是一个更大系统中的一小部分。
LangChain 说得简洁:Agent = 模型 + 框架。如果你不是模型,你就是框架。
下面这个发现应该让你重新排列优先级:框架现在比模型更重要。有些团队保持模型不变,只改了外围代码,就从基准测试的中游直接跃升到前五名。同一个大脑,不同的循环。
循环工程就是构建那个大脑运行环境的学问。让我告诉你,哪些环节才是真正容易出问题的。
This is the problem nobody warns you about.
When an agent stops asking for tools, it has ended its turn. That is not the same as finishing the job.
Picture a coding agent. It writes some code, glances around, sees that progress was made, and announces it's done. The tests still fail. It declared victory anyway.
A terminal message ends the turn, not the task. Confusing those two is the most common way loops go wrong.
Good loops stop for the right reasons, so you layer several brakes:
- Max iterations. A hard cap so a stuck agent can't run forever.
- Budget and time limits. A ceiling on tokens, money, and seconds.
- No-progress detection. If it repeats the same call with the same arguments, it's spinning.
- A real completion check. An automated condition proving the job is done. That last one carries the weight. "Done" should mean the tests pass, not the agent feeling good about its work.
这是一个没人提醒过你的问题。
当 agent 不再请求工具,它只是结束了本轮交互,这和完成任务是两码事。
想象一个编程 agent:它写了一些代码,四处看了看,发现有了进展,就宣布完成了。但测试仍然失败——它还是宣布胜利。
一条终止消息结束的是本轮交互,而不是任务。混淆这两者,是循环出错最常见的原因。
好的循环会因为正确的原因停止,因此你需要叠加几层刹车机制:
- 最大迭代次数:硬上限,防止卡住的 agent 无限运行。
- 预算与时间限制:对 token、费用和时间的上限。
- 无进展检测:如果 agent 用相同的参数重复调用同一个工具,说明它在原地打转。
- 真正的完成检查:一个自动化条件,能够证明任务确实完成了。 最后一条最为关键。“完成”应该意味着测试通过,而不是 agent 自我感觉良好。
Long loops rot from the inside.
The more turns an agent takes, the more junk piles into its context, like old tool outputs, dead ends, and stale reasoning. Model performance drops as that pile grows. The field calls it context rot.
A loop makes it spiral. A rotted context produces a worse decision, which adds more noise, which rots the context further. People call this the doom loop, and you've felt it. The agent gets dumber the longer it runs.
You fight it by treating context as a budget, not a bucket:
- Compaction. Summarize the conversation when it gets long, then continue from the summary.
- Offloading. Push huge outputs to a file and keep only the slice you need.
- Sub-agents. Hand a messy subtask to a separate agent and let only its clean result return. The instinct is to keep everything, just in case. The skill is knowing what to throw away.
长时间运行的循环会从内部腐烂。
agent 交互的轮次越多,上下文里堆积的垃圾就越多——旧的工具输出、死胡同、过时的推理。随着垃圾堆积,模型性能就会下降。业内称之为上下文腐烂(context rot)。
循环会让它螺旋恶化:腐烂的上下文导致更差的决策,更差的决策又带来更多噪声,进一步加剧上下文腐烂。人们称之为“厄运循环”(doom loop),你应该也体会过——agent 跑得越久就越笨。
应对的办法是把上下文视为预算,而不是一个容器:
- 压缩:当对话变长时,先总结对话,然后从总结处继续。
- 卸载:将巨大的输出推送到文件中,只保留你需要的那一部分。
- 子 agent:将混乱的子任务交给另一个 agent,只让干净的最终结果返回。 本能反应是保留一切,以防万一。真正的技巧在于知道该丢弃什么。
A loop is only as good as the tools inside it.
Pile on a hundred tools and the agent loses track of which one to reach for. A tight set of focused, non-overlapping tools wins. Anthropic's rule of thumb is sharp. If a human engineer can't say for certain which tool fits, the agent has no chance.
Two things matter more than people expect:
- Make writes safe to repeat. Loops retry, and if a retried "create customer" call makes a second customer, you'll wake up to duplicate records and double billing. Anything that changes state has to be safe to call twice.
- Write error messages for the agent, not the human. A good error tells the agent what to do next. Before a tool ships, ask whether an LLM reading its error would know the next move. In a loop, an error isn't a dead end. It's the next instruction.
循环的质量取决于它内部的工具。
塞进去一百个工具,agent 就不知道该用哪一个了。一组紧凑、聚焦且不重叠的工具才是最优解。Anthropic 的经验法则很犀利:如果一个人类工程师都不能确定该用哪个工具,那 agent 更没戏。
有两件事比人们预想的更重要:
- 让写操作可重复安全执行。循环会重试。如果重试的“创建客户”调用生成了第二个客户,你就会醒来看到重复记录和双倍账单。任何改变状态的操作都必须能够安全地调用两次。
- 错误信息要写给 agent,而不是人。一个好的错误应该告诉 agent 下一步该怎么做。在发布一个工具之前,先问自己:一个 LLM 读到它的错误信息后,知道下一步是什么吗? 在循环中,错误不是死路,而是下一条指令。
Autonomous loops have a quiet failure mode. An agent left alone tends to agree with itself.
The sharpest comment in the whole debate nailed it. Designing the loop is half the job, and the other half is putting something in the loop that can say no, like a test, a type check, or a real error.
A loop with no critic is just an agent nodding along to its own work.
The fix is to separate the maker from the checker. One model does the work. A different check, often a separate model or a hard test, grades it. The worker doesn't grade its own homework.
自主循环有一个隐蔽的失败模式:被单独留着的 agent 往往会同意自己的看法。
整个讨论中最精辟的评论点明了关键:“设计循环只完成了工作的一半,另一半是在循环中加入一个能说‘不’的东西——比如测试、类型检查或真正的错误。”
一个没有批评者的循环,不过是一个对自己的工作点头赞许的 agent。
解决办法是将“制造者”与“检查者”分开。一个模型负责工作,另一个不同的检查机制——通常是另一个模型或一个严格的测试——来评估它。工人不能给自己的作业打分。
Now Cherny's quote makes sense.
Prompting is you steering the agent move by move. Loop engineering is you building the system that steers it, then stepping back.
Your job changes from giving instructions to designing three things:
- The goal, written as success criteria the agent can check itself against.
- The loop, with sane brakes so it stops well.
- The verifier, so "done" is proven, not claimed. Andrej Karpathy captures the mindset. Don't tell the model what to do, give it success criteria and watch it go. He runs research loops overnight that tweak a script, test it, keep what works, and discard what doesn't, with himself nowhere in the loop. He arranges it once and hits go.
That's the whole move. You stop being the hands and become the person who designs the machine.
现在 Cherny 的那句话就说得通了。
提示词是你一步步地引导 agent;循环工程则是你构建一个系统来引导它,然后自己退后一步。
你的工作从下达指令转变为设计三件事:
- 目标:写成一组合格标准(success criteria),agent 可以据此自我检查。
- 循环:配上合理的刹车机制,让它能适时停下。
- 验证器:让“完成”成为可被证明的事实,而不是一个声称。 Andrej Karpathy 精准地概括了这种心态:“别告诉模型该做什么,给它合格标准,然后看它自己动起来。”他整夜运行研究循环,让脚本自行调整、测试、保留有效部分、丢弃无效部分,而他自己完全不在循环里。他只安排一次,然后按下启动键。
这就是整个转变的要点:你不再是那双手,而是设计那台机器的人。
You don't need an overnight autonomous agent on day one. Build up to it:
- Start with the basic loop, and add a max-iteration cap, a timeout, and a cost ceiling right away.
- Define "done" as an automated check before you begin, not a vibe afterward.
- Protect the context. Compact long runs, offload big outputs, isolate messy subtasks.
- Audit your tools. Keep them few and focused, make writes safe to repeat, and rewrite errors so an agent can act on them.
- Put a critic in the loop. Only go fully hands-off once you trust the thing that says no.
你不必第一天就搞出一个整夜运行的自主 agent。可以循序渐进:
- 从基本循环开始,同时立即添加最大迭代次数限制、超时和成本上限。
- 在开始之前就把“完成”定义为一项自动化检查,而不是事后凭感觉判断。
- 保护上下文:压缩长时间运行的内容,卸载大型输出,隔离混乱的子任务。
- 审查你的工具:保持少量且聚焦,确保写操作可重复安全执行,重写错误信息以便 agent 能够据此行动。
- 在循环中加入一个批评者。只有在你能信任那个说“不”的机制之后,才完全放手。
Loop engineering isn't a framework or a tool you install. It's a shift in where you aim your effort.
The model is becoming a commodity. The loop around it is where the real engineering lives now.
The best builders stopped asking "what should I tell the agent to do?" They started asking "what system would do this without me?"
Answer that one well, and you'll stop prompting too.
Here's a summary of

Thanks for reading!
Cheers! Akshay.
循环工程不是某个框架或工具,需要你安装。它是指引你投入精力的方向转变。
模型正在变成一种商品。模型外围的循环,才是真正的工程所在。
最优秀的构建者已经不再问“我该告诉 agent 做什么?”,而是开始问“什么样的系统能在没有我的情况下完成这件事?”
如果你能很好地回答这个问题,你也会停止写提示词。
以下是要点总结

感谢阅读!
祝好, Akshay。