Glean 拾遗
日刊 /2026-06-10 / 循环工程:让代码智能体在后台自主运行,而你设计的是循环本身

循环工程:让代码智能体在后台自主运行,而你设计的是循环本身

原文 x.com 收录 2026-06-10 06:00 阅读 14 min
AI 解读

本文来自 Addy Osmani 对编码智能体未来工作模式的深度观察。核心观点是,与编码智能体交互的方式正从直接的提示词工程转向循环工程:工程师不再亲自写每一步提示词,而是设计一个包含定时自动化、并行工作树、项目技能、连接器插件和子智能体检查者的闭环系统,让系统自己去发现任务、分配工作、验证结果。文章详细拆解了这五个构建块及其在 Claude Code 和 Codex 中的对应实现,并强调了循环运行中必须持续关注验证、理解债务和认知舒适区的陷阱。适合正在探索如何将 AI 编码工具从一次性助手升级为持续后台工作者的资深工程师,尤其是那些关心代码质量、认知负担和成本控制的团队。

原文 14 分钟
原文 x.com ↗
§ 1

Loop Engineering.

By @addyosmani · 2026-06-08T23:30:34.000Z

Loop Engineering.

Loop engineering is replacing yourself as the person who prompts the agent. You design the system that does it instead. A loop here can be thought of a recursive goal where you define a purpose and the AI iterates until complete. It's roughly five building blocks and Claude Code and Codex both have all five now.

Loop Engineering(循环工程)

作者:@addyosmani · 2026-06-08T23:30:34.000Z

循环工程。

循环工程让你不再亲自提示代理,而是去设计一个替你完成提示的系统。这里的“循环”可以理解为一个递归目标:你定义一个目的,AI 不断迭代直至完成。它大致包含五个构建块,Claude Code 和 Codex 现在都集成了全部这五个块。

§ 2

I believe this may be the future of how we work with coding agents. However, its still early, I'm skeptical and you absolutely have to be careful about token costs (usage patterns can vary wildly if you are token rich or poor). You also still need some way to ensure quality doesn't drop and concerns re: slop are valid. That said, let's explore what this is all about.

我相信这可能是使用编码代理的未来方式。然而,现在还处于早期阶段,我持怀疑态度。你必须非常小心 token 成本(丰俭不同的用户,使用模式差异巨大)。你还需要某种方式来确保质量不下降,对于“灌水”的担忧也是合理的。话虽如此,让我们深入了解一下这到底是怎么回事。

§ 3

@steipete recently said: “You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents.” Similarly, @bcherny, head of Claude Code at Anthropic, said “I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops”.

@steipete 最近说:“你不应该再提示编码代理了。你应该设计循环,让循环去提示代理。”类似地,Anthropic 公司 Claude Code 负责人 @bcherny 也表示:“我不再提示 Claude 了。我运行着一些循环,它们去提示 Claude 并决定要做什么。我的工作就是编写循环。”

§ 4

Okay, so what does any of that mean?

For like two years the way you got something out of a coding agent was you wrote a good prompt and shared enough context. You type a thing, you read what came back, you type the next thing. The agent is a tool and you are holding it the entire time, one turn after the other. That part is kind of over, or at least some think it's going to be.

Now you build a small system that finds the work, hands it out, checks it, writes down what is done and then decides the next thing, and you let that system poke the agents instead of you. I wrote before about the cousin of this, agent harness engineering, which is making the environment one single agent runs inside and the factory model - the system that builds the software. Loop engineering sits one floor above the harness. The harness but it runs on a timer, it spawns little helpers, and it feeds itself.

好吧,这到底是什么意思?

大约两年来,你从编码代理那里获得成果的方式是:写一个好的提示,提供足够的上下文。你输入一些内容,阅读返回的结果,再输入下一个内容。代理是一个工具,你全程握着它,一轮接一轮。那种方式已经过时了,至少有些人认为它即将过时。

现在,你构建一个小系统:它寻找任务、分派任务、检查结果、记录已完成的工作,然后决定下一步。让这个系统去驱动代理,而不是你。我之前写过它的近亲——代理夹具工程,即为单个代理创建运行环境,以及工厂模式——构建软件的系统。循环工程位于夹具工程之上:夹具加上定时器,它生成小助手,并能自我维持。

§ 5

The thing that surprised me is this is not really a tool thing anymore. A year ago if you wanted a loop you wrote a pile of bash and you maintained that pile forever and it was yours and only yours. Now the pieces just ship inside the products. Steinberger's list maps almost exactly onto the Codex app, and then almost the same onto Claude Code. And once you notice the shape is the same you stop arguing about which tool, you just design a loop that still works no matter which one you happen to be sitting in.

让我惊讶的是,这已经不再是工具相关的问题了。一年前,如果你想构建一个循环,你得写一堆 bash 脚本,然后永远维护它们,那只是属于你自己的脚本。现在,这些组件直接内置于产品中。Steinberger 列出的清单几乎完全对应 Codex 应用,也几乎同样对应 Claude Code。一旦你发现它们形状相同,就不会争论哪个工具更好,只需设计一个不依赖特定工具的循环,你在哪个环境里它都能工作。

§ 6

The five pieces, and then notes

A loop needs five things and then one place to remember stuff. Let me list it first and then map it.

  1. Automations that go off on a schedule and do discovery and triage by themselves.
  2. Worktrees so two agents working in parallel dont step on each other.
  3. Skills to write down the project knowledge the agent would otherwise just guess.
  4. Plugins and connectors to plug the agent into the tools you already use.
  5. Sub-agents so one of them has the idea and a different one checks it. Then the sixth thing, the memory. A markdown file, or a Linear board, anything that lives outside the single conversation and holds what's done and what is next. Sounds too dumb to matter. But it's the same trick every long running agent depends on and I went into it in long-running agents, the model forgets everything between runs so the memory has to be on disk and not in the context. The agent forgets, the repo doesnt.

Both products have all five now.

The names are a bit different here and there but the capability is the same thing. Let me go one by one because honestly the details are where a loop either holds together or quietly leaks everywhere.

五个组件,以及备注

一个循环需要五样东西,外加一个记录状态的地方。我先列出它们,再逐个对应。

  1. 自动化(Automations):按计划自动执行,完成发现和分类(discovery and triage)。
  2. 工作树(Worktrees):使并行运行的两个代理不会互相干扰。
  3. 技能(Skills):记录项目知识,否则代理只能猜测。
  4. 插件和连接器(Plugins and Connectors):将代理接入你已有的工具。
  5. 子代理(Sub-agents):一个负责提出想法,另一个负责检查。

第六项是记忆(memory)。一个 Markdown 文件,或一个 Linear 看板——任何独立于单次对话之外、记录已完成和下一步计划的东西。听起来太简单而不值一提,但这是每个长期运行的代理所依赖的技巧。我在关于长期运行的代理的文章中讨论过:模型在两次运行之间会忘记一切,因此记忆必须保存在磁盘上,而不是在上下文中。代理会忘记,但仓库不会。

现在两个产品都具备了这五个组件。

名称上略有差异,但能力是相同的。让我逐一介绍,因为说实话,细节决定了一个循环是稳健运行,还是悄无声息地到处泄漏。

§ 7

Automations, this is the heartbeat

Automations are what make a loop an actual loop and not just one run you did once. In the Codex app you make one in the Automations tab and you pick the project, the prompt it will run, how often, and if it runs on your local checkout or on a background worktree. The runs that find something go to a Triage inbox, and the runs that find nothing just archive themselves which is nice. OpenAI uses them internally for boring stuff like daily issue triage, summarizing CI failures, writing commit briefings, hunting bugs somebody added last week. And an automation can call a skill, so you keep the recurring thing maintainable, you fire $skill-name instead of pasting a giant wall of instructions into a schedule that nobody will ever update.

Claude Code gets to the same place but through scheduling and hooks. You can run a prompt or a command on a interval with /loop, you can schedule a cron task, you can fire shell commands at certain points in the agent lifecycle with hooks, or you push the whole thing to GitHub Actions if you want it to keep running after you close the laptop. Same idea exactly, you define an autonomous task, you give it a cadence, and the findings come to you so you are not the one going around checking.

There is a second in-session primitive worth knowing, and it's the one closer to what this whole post is about. /loop re-runs on a cadence. /goal keeps going until a condition you wrote is actually true, and after every turn a separate small model checks whether you are done, so the agent that wrote the code isnt the one grading it. You give it something like "all tests in test/auth pass and lint is clean" and walk away. Codex has the same thing, also called /goal, it keeps working across turns until a verifiable stopping condition holds, with pause and resume and clear. Same primitive, both tools, wich is kind of the pattern for this whole article.

自动化:循环的心跳

自动化是让循环真正成为循环,而不是一次性运行的关键。在 Codex 应用中,你可以在 Automations 标签页创建一个自动化,选择项目、要运行的提示、运行频率,以及是在本地检出还是后台工作树上运行。有发现的运行结果会进入分类收件箱,无发现的则自动归档,这很不错。OpenAI 内部用它们处理枯燥的事务:日常 issue 分类、总结 CI 失败、编写提交简报、查找上周引入的 bug。自动化还可以调用技能(skill),因此你可以保持重复性工作的可维护性,只需调用 $skill-name,而不是将一长串指令粘贴到没人会更新的定时任务中。

Claude Code 通过调度和钩子实现了同样的功能。你可以用 /loop 定时运行一个提示或命令,可以设置 cron 任务,可以用钩子在代理生命周期的特定点触发 shell 命令,或者如果希望关闭笔记本电脑后继续运行,可以将其推送到 GitHub Actions。思路完全相同:定义一个自主任务,设定节奏,结果会汇集到你这里,你无需主动检查。

还有一个值得了解的会话内原语,它更接近本文的主题:/loop 按节奏重复运行。/goal 持续运行直到你设定的条件成立,每轮之后一个独立的小模型检查是否完成,从而编写代码的代理不会给自己打分。你可以设置类似“test/auth 中的所有测试通过且 lint 干净”的条件后离开。Codex 也有同样的功能,也叫 /goal,它跨轮次持续工作,直到可验证的停止条件成立,支持暂停、恢复和清除。两个工具都有相同的原语,这也是贯穿本文的模式。

§ 8

Worktrees so parallel doesn't turn into chaos

The second you run more than one agent the files start colliding, that becomes the failure. Two agents writing the same file is the exact same headache as two engineers committing to the same lines and nobody talked to each other first. A git worktree fixes it, its a separate working directory on its own branch sharing the same repo history, so one agent's edits literally can not touch the other one's checkout.

Codex builds the worktree support right in so several threads hit the same repo at once and dont bump into each other. Claude Code gives you the same isolation with git worktree, a --worktree flag to open a session in its own checkout, and a isolation: worktree setting you stick on a subagent so each helper gets a fresh checkout that cleans itself up after. I wrote about the human side of all this in the orchestration tax, the worktrees take away the mechanical collision but YOU are still the ceiling, your review bandwith decides how many you can actually run, not the tool.

工作树:防并行冲突

一旦你运行多个代理,文件就开始冲突,这就是失败的根源。两个代理写入同一个文件,跟两个工程师在没有事先沟通的情况下提交到同一行代码一样令人头疼。git worktree 解决了这个问题:它是一个独立的工作目录,拥有自己的分支,共享同一个仓库历史,因此一个代理的编辑完全不会影响另一个代理的检出。

Codex 内置了工作树支持,允许多个线程同时操作同一个仓库而互不干扰。Claude Code 也提供了同样的隔离机制:使用 git worktree,--worktree 标志可以在其自己的检出中打开会话,以及 isolation: worktree 设置可以应用于子代理,使得每个助手获得一个全新的检出并在之后自动清理。我之前在“编排税”(orchestration tax)中谈过这个问题的人性化层面:工作树消除了机械冲突,但你仍然是瓶颈——你的评审带宽决定了实际能并行运行多少个代理,而不是工具。

§ 9

Skills, so you stop explaining your project every single time

A skill is how you stop re-explaining the same project context every session like a goldfish. Both tools use the same format, a folder with a SKILL.md inside holding instructions and metadata, and then optional scripts, references, assets. Codex runs a skill when you call it with $ or /skills, or by itself when your task matches the skill description, which is the reason a tight boring description beats a clever one. Claude Code does it the same way and I wrote the pattern up in agent skills.

Skills are also where intent stops costing you over and over. I argued in the intent debt that an agent starts every session cold and it will fill any hole in your intent with a confident guess. A skill is that intent written down on the outside, the conventions, the build steps, the “we dont do it like this because of that one incident”, written one time where the agent reads it every run. Without skills the loop re-derives your whole project from zero every cycle, with skills it kind of compounds.

One thing to keep straight, the skill is the authoring format and a plugin is how you ship it. When you want to share a skill across repos or bundle a few together you package them as a plugin. True in Codex, true in Claude Code.

技能:终结重复解释项目上下文

技能让你不再像个金鱼一样每轮对话都重复解释相同的项目上下文。两个工具使用相同的格式:一个包含 SKILL.md 的文件夹,里面存放指令和元数据,以及可选的脚本、引用和资源。Codex 在你调用 $/skills 时运行技能,或者当任务匹配技能描述时自动运行——这也是为什么一个紧凑而朴素的描述胜过花哨描述的原因。Claude Code 的做法相同,我曾在代理技能一文中详细阐述过这一模式。

技能也让你无需反复付出意图成本。我在意图债一文中指出,代理每次会话都是从零开始,它会用自信的猜测填补你意图中的任何空白。技能就是将意图写在外部:约定、构建步骤、“我们之所以不这么做是因为某次事件”,一次性写好后代理每次运行都会读取。没有技能,循环每个周期都要从头推导整个项目;有了技能,知识就会累积。

需要明确一点:技能是撰写格式,而插件是交付方式。当你想跨仓库共享技能或将多个技能捆绑在一起时,可以将其打包为插件。Codex 和 Claude Code 都是如此。

§ 10

Plugins and connectors, the loop touches your real tools

A loop that can only see the filesystem is a tiny loop. Connectors, which are built on MCP, let the agent read your issue tracker, query a database, hit a staging api, drop a message in Slack. Codex and Claude Code both speak MCP so the connector you wrote for one usually just works in the other. And plugins bundle connectors and skills together so your teammate installs your setup in one go instead of rebuilding the whole thing from memory.

This is the difference between an agent that says “here is the fix” and a loop that opens the PR, links the Linear ticket and pings the channel once CI is green by itself. The connectors are the reason the loop can act inside your actual environment instead of just telling you what it would do if it could.

插件与连接器:接入真实工具

一个只能看到文件系统的循环是一个微小的循环。连接器(基于 MCP 构建)让代理能够读取你的 issue 追踪器、查询数据库、访问 staging API、在 Slack 中发消息。Codex 和 Claude Code 都支持 MCP,因此你为一个工具编写的连接器通常可以直接在另一个工具中使用。插件则将连接器和技能打包在一起,使你的同事可以一次性安装你的整个配置,而不需要凭记忆重建一切。

这就是“代理说‘这是修复方案’”和“循环自动创建 PR、关联 Linear 工单,并在 CI 通过后自动通知频道”之间的区别。连接器是循环能够在你的实际环境中行动的原因,而不是仅仅告诉你“如果它能做的话会怎么做”。

§ 11

Sub-agents, keep the maker away from the checker

The most useful structural thing in a loop, by far, is splitting the one who writes from the one who checks. The model that wrote the code is way too nice grading its own homework. A second agent with different instructions and sometimes a different model catches the stuff the first one talked itself into.

Codex only spawns subagents when you ask, runs them at the same time and then folds the results back into one answer. You define your own agents as TOML files in .codex/agents/, each with a name, a description, instructions and optional model and reasoning effort, so your security reviewer can be a strong model on high effort while your explorer is some fast read-only thing. Claude Code does the same with subagents in .claude/agents/ and agent teams that pass work between them. The usual split in both is one agent explores, one implements, one verifies against the spec.

I made this case twice already, once as the code agent orchestra and once as adversarial code review. The reason it matters specifically inside a loop is the loop runs while you are not watching, so a verifier you actually trust is the only reason you can walk away. Subagents do burn more tokens since each one does its own model and tool work, so spend them where a second opinion is worth paying for. This is also basically what Claude Code's /goal does under the hood, a fresh model decides if the loop is done instead of the one that did the work, the maker and checker split applied to the stop condition itself.

子代理:分离制造与检查

到目前为止,循环中最有价值的结构性做法,就是将编写者与检查者分离。编写代码的模型给自己打分时难免太宽松。而一个拥有不同指令(有时是不同模型)的第二代理可以捕捉到第一个代理自欺欺人的部分。

Codex 仅在您要求时生成子代理,让它们同时运行,然后将结果合并为一个答案。你可以在 .codex/agents/ 中以 TOML 文件定义自己的代理,每个代理包含名称、描述、指令以及可选的模型和推理努力程度,这样你的安全审查员可以使用高推理努力度的强模型,而探索者则可以使用快速的只读模型。Claude Code 同样通过 .claude/agents/ 中的子代理以及传递工作的代理团队来实现。两者通常的分工是:一个代理探索,一个实现,一个对照规范进行验证。

我之前已经两次论证过这一点,一次是代码代理管弦乐队,一次是对抗性代码审查。它在循环中之所以重要,是因为循环在你不在场时运行,因此一个你真正信任的验证者是你能够走开的唯一理由。子代理确实会消耗更多 token,因为每个子代理都有自己的模型和工具工作,所以要把 token 花在值得寻求第二意见的地方。这也是 Claude Code 的 /goal 在底层的基本原理:由一个新模型判断循环是否完成,而不是执行工作的那个模型——这是将制造与检查分离应用于停止条件本身。

§ 12

What one loop looks like

Stick it together and a single thread turns into a little control panel. Here is one shape I keep using.

An automation runs every morning on the repo. Its prompt calls a triage skill that reads yesterdays CI failures, the open issues, the recent commits, and writes the findings into a markdown file or a Linear board. For each finding that is worth doing the thread opens an isolated worktree and sends a sub-agent to draft the fix, and a second sub-agent reviews that draft against the project skills and the existing tests.

Connectors let the loop open the PR and update the ticket. Anything the loop can not handle lands in the triage inbox for me. The state file is the spine of the whole thing, it remembers what got tried, what passed, what is still open, so tomorrow morning the run picks up where today stopped.

And look at what you actually did there. You designed it one time. You did not prompt any of those steps. Thats Steinberger's whole point made real, and its the same loop in Codex or in Claude Code because the pieces are the same pieces.

一个循环的实际形态

把这些组件组合在一起,一个简单的线程就变成了一个小小的控制面板。以下是我经常使用的一种形态。

每天早晨,一个自动化在仓库上运行。它的提示调用分类技能,读取昨天的 CI 失败、公开 issue、最近的提交,并将发现写入一个 Markdown 文件或 Linear 看板。对于每个值得处理的问题,该线程会打开一个隔离的工作树,并派送一个子代理起草修复方案,然后由第二个子代理对照项目技能和现有测试审查该草案。

连接器让循环能够创建 PR 并更新工单。任何循环无法处理的问题都会落入我的分类收件箱。状态文件是整个系统的脊柱,它记录了什么已尝试、什么已通过、什么仍未解决,这样第二天早上运行可以从今天停止的地方继续。

看看你实际做了什么:你只设计了一次。你没有提示任何一个步骤。这正是 Steinberger 观点的真正体现,而且这个循环在 Codex 或 Claude Code 中是一样的,因为组件是相同的。

§ 13

What the loop still does not do for you

The loop changes the work, it does not delete you from it. And three problems actually get sharper as the loop gets better, not easier.

Verification is still on you. A loop running unattended is also a loop making mistakes unattended. The whole reason you split the verifier sub-agent from the maker is to make the loop's “its done” mean something, and even then “done” is a claim and not a proof. I keep saying the same line from code review in the age of AI, your job is to ship code you confirmed works.

Your understanding still rots if you allow it. The faster the loop ships code you did not write, the bigger the gap between what exists and what you actually get. Thats comprehension debt and a smooth loop just makes it grow faster unless you read what the loop made.

And yeah, the comfortable posture is probably the risky one. When the loop runs itself its very tempting to stop having an opinion and just take whatever it gives back. I called that cognitive surrender. Designing the loop is the cure when you do it with judgement and the accelerant when you do it to avoid thinking, same action, opposite result.

循环不能替你解决的三个问题

循环改变了工作方式,但并没有将你从中删除。而且随着循环越来越好,三个问题会变得更尖锐,而不是更简单。

验证责任仍在你的肩上。 无人值守运行的循环,同样也是无人值守地犯错的循环。你将验证子代理与制造子代理分离的全部原因,就是为了让循环所谓的“已完成”有意义——即便如此,“完成”也只是一个声明,而非证据。我一直在重复 AI 时代代码审查中的那句话:你的工作是交付你确认能工作的代码。

如果你放任自流,你的理解会衰退。 循环交付你没有编写的代码越快,存在的代码与你实际理解之间的差距就越大。这就是理解债(comprehension debt)。一个流畅的循环只会让这种债增长得更快,除非你阅读循环生成的内容。

舒适的姿态很可能是危险的姿态。 当循环自主运行时,你很容易放弃自己的意见,接受它返回的任何结果。我称之为认知投降(cognitive surrender)。设计循环,当你带着判断力去做时是解药;当你为了逃避思考而去做时则是催化剂——相同的行动,相反的结果。

§ 14

Build the loop. Stay the engineer.

I think this is a preview of how our work is going to evolve. That said, If I weren't reviewing the code myself or if I relied entirely on automated loops to fix it my product’s quality would suffer. I'd likely end up stuck in a downward spiral, continuously digging myself into a deeper hole.

That said, go ahead and set up your loops, but don't forget that prompting your agents directly is still effective. It's all about finding the right balance.

Loops can also result in different outcomes depending on you. Two people can build the exact same loop and get completely opposite results. One uses it to move faster on work they understand deeply. The other uses it to avoid understanding the work at all. The loop doesn't know the difference. You do.

That's what makes loop design harder than prompt engineering, not easier. Cherny's point isn’t that the work got easier. It's that the leverage point moved.

Build the loop. But build it like someone who intends to stay the engineer, not just the person who presses go.

构建循环,保持工程师本色

我认为这预示了我们工作方式的演化方向。话虽如此,如果我自己不审查代码,或者完全依赖自动化循环来修复问题,我的产品质量会受损。我很可能会陷入一个恶性循环,越陷越深。

尽管如此,尽管去设置你的循环吧,但不要忘记直接提示代理仍然有效。关键是找到正确的平衡。

循环也会因你而异,产生不同的结果。两个人可以构建完全相同的循环,却得到截然相反的结果。一个人用它来加速自己深刻理解的工作;另一个人用它来完全避免理解工作。循环不知道区别,但你知道。

这就是为什么循环设计比提示工程更难,而不是更容易。Cherny 的观点不是工作变容易了,而是杠杆点转移了。

构建循环。但要像那些打算保持工程师本色的人那样去构建,而不仅仅是按下“执行”按钮的人。

打开原文 ↗