Loop Engineering
Loop Engineering proposes a shift from hand-prompting coding agents to designing autonomous loops: a system with five components (scheduled automations for discovery, worktrees for parallel isolation, skills to codify project context, plugins/connectors via MCP, and verifier sub-agents) that lets agents iterate without manual intervention. The post maps these primitives across Codex and Claude Code, noting that memory persisted outside the conversation (via AGENTS.md or Linear) is the critical sixth piece. The core insight is that loop design is harder than prompt engineering—the engineer's role moves from operator to system architect, while verification burden, comprehension debt, and cognitive surrender remain unresolved challenges that the loop itself cannot eliminate.
Loop engineering is replacing yourself as the person who prompts the agent. You design the system that does it instead. A loop here can be thought of a recursive goal where you define a purpose and the AI iterates until complete. I believe this may be the future of how we work with coding agents. However, its still early, I’m skeptical and you absolutely have to be careful about token costs (usage patterns can vary wildly if you are token rich or poor), so I want to unpack what it is and what it means.
循环工程意味着你不再是那个手动给 agent 写提示的人,而是设计一个系统来代劳。这里的“循环”可以理解为一个递归目标——你定义一个目的,AI 反复迭代直到完成。我相信这可能是未来与编码 agent 协作的方式。不过现在还为时过早,我持怀疑态度,而且你必须小心 token 成本(token 富余或匮乏时使用模式差异巨大),所以我想拆解一下它是什么以及意味着什么。
Peter Steinberger recently said: “You shouldn’t be prompting coding agents anymore. You should be designing loops that prompt your agents.” Similarly, Boris Cherny, head of Claude Code at Anthropic, said “I don’t prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops”.
Peter Steinberger 最近说:“你不再应该给编码 agent 写提示了。你应该设计循环来驱动你的 agent。”类似地,Anthropic 旗下 Claude Code 的负责人 Boris Cherny 也表示:“我不再手动提示 Claude 了。我有循环在运行,它们提示 Claude 并决定做什么。我的工作就是写循环。”
For like two years the way you got something out of a coding agent was you wrote a good prompt and shared enough context. You type a thing, you read what came back, you type the next thing. The agent is a tool and you are holding it the entire time, one turn after the other. That part is kind of over, or at least some think it’s going to be. Now you build a small system that finds the work, hands it out, checks it, writes down what is done and then decides the next thing, and you let that system poke the agents instead of you. I wrote before about the cousin of this, agent harness engineering, which is making the environment one single agent runs inside and the factory model - the system that builds the software. Loop engineering sits one floor above the harness. The harness but it runs on a timer, it spawns little helpers, and it feeds itself.
过去两年,从编码 agent 获取成果的方式是:你写一个不错的提示,提供足够的上下文;你输入一条指令,阅读返回结果,再输入下一条指令。agent 是一个工具,你全程把持着它,一个回合接一个回合。这部分基本结束了,至少有些人认为是这样。现在,你构建一个小系统:它发现任务、分配任务、检查结果、记录完成情况,然后决定下一步。你让这个系统去驱动 agent,而不是自己动手。我之前写过它的近亲——agent 装备工程,即构建单个 agent 运行的环境和工厂模式(构建软件的系统)。循环工程位于装备工程之上。它像一个带定时器的装备,能生成小助手,并且能够自我驱动。
The thing that surprised me is this is not really a tool thing anymore. A year ago if you wanted a loop you wrote a pile of bash and you maintained that pile forever and it was yours and only yours. Now the pieces just ship inside the products. Steinberger’s list maps almost exactly onto the Codex app, and then almost the same onto Claude Code. And once you notice the shape is the same you stop arguing about which tool, you just design a loop that still works no matter which one you happen to be sitting in.
让我惊讶的是,这不再是某个工具的特有功能了。一年前,如果你想要一个循环,你得写一堆 bash 脚本,永远维护它们,而且那是你自己独有的。如今,这些组件直接内置在产品里。Steinberger 的列表几乎完全对应到 Codex 应用,又几乎同样对应到 Claude Code。一旦你发现它们的形态是一样的,你就不会再争论用哪个工具,而是直接设计一个循环,无论你当前用哪个工具都能工作。
Automations that go off on a schedule and do discovery and triage by themselves. Worktrees so two agents working in paralell dont step on each other. Skills to write down the project knowledge the agent would otherwise just guess. Plugins and connectors to plug the agent into the tools you already use. Sub-agents so one of them has the idea and a different one checks it. Then the sixth thing, the memory. A markdown file, or a Linear board, anything that lives outside the single conversation and holds what’s done and what is next. Sounds too dumb to matter. But it’s the same trick every long running agent depends on and I went into it in long-running agents, the model forgets everything between runs so the memory has to be on disk and not in the context. The agent forgets, the repo doesnt.
一个循环需要五个组件,外加一个持久化存储位置。自动化(Automations):按计划自动运行,独立完成发现和分类。工作树(Worktrees):确保两个并行工作的 agent 不会互相冲突。技能(Skills):记录项目知识,避免 agent 每次都靠猜测。插件与连接器(Plugins and Connectors):将 agent 接入你已有的工具。子 agent(Sub-agents):一个负责构思,另一个负责检查。第六个是记忆(Memory):一个 Markdown 文件或 Linear 看板,存放在单个对话之外,记录已完成和待办事项。听起来简单得无关紧要,但这是每个长时间运行 agent 都依赖的同一技巧,我在“长时间运行 agent”一文中详细探讨过:模型在运行之间会忘记一切,所以记忆必须存储在磁盘上,而非上下文中。agent 会遗忘,但仓库不会。
Automations are what make a loop an actual loop and not just one run you did once. In the Codex app you make one in the Automations tab and you pick the project, the prompt it will run, how often, and if it runs on your local checkout or on a background worktree. The runs that find something go to a Triage inbox, and the runs that find nothing just archive themselves wich is nice. OpenAI uses them internally for boring stuff like daily issue triage, summarising CI failures, writing commit briefings, hunting bugs somebody added last week. And an automation can call a skill, so you keep the recurring thing maintainable, you fire $skill-name instead of pasting a giant wall of instructions into a schedule that nobody will ever update.
自动化是让循环成为真正循环的关键,而不是一次性的运行。在 Codex 应用中,你可以在“自动化”选项卡创建,选择项目、要运行的提示、频率,以及是在本地仓库还是后台工作树上运行。发现问题的运行结果会归入“分类收件箱”;什么都没发现的运行结果则自行归档,这很贴心。OpenAI 内部用它们处理枯燥任务:每日问题分类、总结 CI 失败、编写提交简报、追踪上周引入的 bug。自动化还可以调用技能,这样你就把周期性任务维护好了,只需触发 $skill-name,而不是把一大段说明粘贴到永远不会有人更新的计划中。
Claude Code gets to the same place but through scheduling and hooks. You can run a prompt or a command on a interval with /loop, you can schedule a cron task, you can fire shell commands at certain points in the agent lifecycle with hooks, or you push the whole thing to GitHub Actions if you want it to keep running after you close the laptop. Same idea exactly, you define an autonomous task, you give it a cadence, and the findings come to you so you are not the one going around checking. There is a second in-session primitive worth knowing, and it’s the one closer to what this whole post is about. /loop re-runs on a cadence. /goal keeps going until a condition you wrote is actually true, and after every turn a separate small model checks whether you are done, so the agent that wrote the code isnt the one grading it. You give it something like “all tests in test/auth pass and lint is clean” and walk away. Codex has the same thing, also called /goal, it keeps working across turns until a verifiable stopping condition holds, with pause and resume and clear. Same primitive, both tools, wich is kind of the pattern for this whole article. So this is the part that surfaces the work. The rest of the loop is what acts on it.
Claude Code 通过调度和钩子达到同样的效果。你可以用 /loop 按一定间隔运行提示或命令,可以调度 cron 任务,可以在 agent 生命周期的特定点用钩子触发 shell 命令,或者在关闭笔记本后通过 GitHub Actions 继续运行。思路完全相同:定义自主任务,设定节奏,结果会自动呈现给你,无需你亲自去检查。还有一个会中原语值得了解,它与整篇文章的主题更接近。/loop 按节奏重复运行。/goal 持续运行直到你设定的条件为真,每轮结束后会有一个独立的小模型检查你是否完成,这样写代码的 agent 就不会自己给自己评分。你可以给它类似“test/auth 下所有测试通过且 lint 干净”的条件后直接走开。Codex 也有同样的功能,也叫 /goal,它会跨轮持续工作直到可验证的停止条件成立,支持暂停、恢复和清除。两个工具都有相同的原语,这也是整篇文章展示的模式。这部分负责发现工作,循环的其余部分则负责执行。
The second you run more than one agent the files start colliding, that becomes the failure. Two agents writing the same file is the exact same headache as two engineers committing to the same lines and nobody talked to each other first. A git worktree fixes it, its a separate working directory on its own branch sharing the same repo history, so one agent’s edits literally can not touch the other one’s checkout. Codex builds the worktree support right in so several threads hit the same repo at once and dont bump into each other. Claude Code gives you the same isolation with git worktree, a --worktree flag to open a session in its own checkout, and a isolation: worktree setting you stick on a subagent so each helper gets a fresh checkout that cleans itself up after. I wrote about the human side of all this in the orchestration tax, the worktrees take away the mechanical collision but YOU are still the ceiling, your review bandwith decides how many you can actually run, not the tool.
一旦运行多个 agent,文件就会开始冲突,这会导致失败。两个 agent 同时写同一个文件,和两个工程师没沟通就往同一行提交代码一样令人头疼。git worktree 解决了这个问题:它是一个独立的工作目录,拥有自己的分支,共享同一个仓库历史,这样某个 agent 的编辑实际上不会影响另一个 agent 的检出。Codex 直接内置了工作树支持,使多个线程可以同时访问同一个仓库而不会冲突。Claude Code 通过 git worktree、--worktree 标志(打开独立检出的会话)以及子 agent 上的 isolation: worktree 设置提供了同样的隔离,使每个助手获得一个全新的检出并在使用后自动清理。我在“编排税”一文中写过这方面的人工因素——工作树消除了机械冲突,但你仍然是瓶颈,你的审查带宽决定了实际可并行运行的 agent 数量,而不是工具。
A skill is how you stop re-explaining the same project context every session like a goldfish. Both tools use the same format, a folder with a SKILL.md inside holding instructions and metadata, and then optional scripts, references, assets. Codex runs a skill when you call it with $ or /skills, or by itself when your task matches the skill description, wich is the reason a tight boring description beats a clever one. Claude Code does it the same way and I wrote the pattern up in agent skills. Skills are also where intent stops costing you over and over. I argued in the intent debt that an agent starts every session cold and it will fill any hole in your intent with a confident guess. A skill is that intent written down on the outside, the conventions, the build steps, the “we dont do it like this because of that one incident”, written one time where the agent reads it every run. Without skills the loop re-derives your whole project from zero every cycle, with skills it kind of compounds. One thing to keep straight, the skill is the authoring format and a plugin is how you ship it. When you want to share a skill across repos or bundle a few together you package them as a plugin. True in Codex, true in Claude Code.
技能让你不再像金鱼一样每次会话都重复解释相同的项目上下文。两者使用相同的格式:一个包含 SKILL.md 的文件夹,里面存放指令和元数据,以及可选的脚本、参考资料和资产。Codex 通过 $ 或 /skills 调用技能,或者在任务与技能描述匹配时自动调用——这就是为什么一个精准但平淡的描述胜过花哨的描述。Claude Code 采用同样的方式,我在“agent 技能”一文中写过这个模式。技能也是你避免重复支付意图成本的地方。我在“意图负债”中论证过,agent 每次会话都从零开始,它会用自信的猜测填补你意图中的任何空白。技能就是写在系统外部的意图——惯例、构建步骤、“我们不这么做是因为那次事故”——写一次,agent 每次运行都会读取。没有技能,循环每一轮都要从头推导整个项目;有了技能,它会积累效应。需要澄清的是:技能是创作格式,插件是分发方式。当你想要跨仓库共享技能或打包多个技能时,将它们作为插件打包即可。这在 Codex 和 Claude Code 中都适用。
A loop that can only see the filesystem is a tiny loop. Connectors, wich are built on MCP, let the agent read your issue tracker, query a database, hit a staging api, drop a message in Slack. Codex and Claude Code both speak MCP so the connector you wrote for one usually just works in the other. And plugins bundle connectors and skills together so your teammate installs your setup in one go instead of rebuilding the whole thing from memory. This is the difference between an agent that says “here is the fix” and a loop that opens the PR, links the Linear ticket and pings the channel once CI is green by itself. The connectors are the reason the loop can act inside your actual environment instead of just telling you what it would do if it could.
一个只能访问文件系统的循环是极其有限的循环。连接器(基于 MCP 构建)让 agent 读取你的问题跟踪器、查询数据库、访问预发布 API、在 Slack 中发送消息。Codex 和 Claude Code 都支持 MCP,所以你在一个工具上写的连接器通常可以直接用在另一个工具上。插件将连接器和技能打包在一起,这样你的队友可以一次性安装你的设置,而不是靠记忆重新构建。这就是 agent 说“这是修复方案”和循环自动打开 PR、关联 Linear 工单、在 CI 通过后通知频道的区别。连接器让循环能够在你的实际环境中行动,而不是仅仅告诉你它如果能做会怎么做。
The most useful structural thing in a loop, by far, is splitting the one who writes from the one who checks. The model that wrote the code is way too nice grading its own homework. A second agent with different instructions and sometimes a different model catches the stuff the first one talked itself into. Codex only spawns subagents when you ask, runs them at the same time and then folds the results back into one answer. You define your own agents as TOML files in .codex/agents/, each with a name, a description, instructions and optional model and reasoning effort, so your security reviewer can be a strong model on high effort while your explorer is some fast read-only thing. Claude Code does the same with subagents in .claude/agents/ and agent teams that pass work between them. The usual split in both is one agent explores, one implements, one verifies against the spec.
到目前为止,循环中最有用的结构设计就是区分写代码者和检查者。写代码的模型在给自己打分时太宽容了。第二个 agent 使用不同的指令,有时是不同的模型,能发现第一个 agent 自我说服后遗留的问题。Codex 只在收到请求时生成子 agent,并行运行,然后将结果合并为一个答案。你可以在 .codex/agents/ 中以 TOML 文件定义自己的 agent,每个 agent 包括名称、描述、指令,以及可选的模型和推理努力程度,这样你的安全审查员可以是高推理程度的高性能模型,而探索 agent 则是快速的只读代理。Claude Code 也通过 .claude/agents/ 中的子 agent 和在其间传递工作的 agent 团队实现同样的功能。两者常见的分工是:一个 agent 探索,一个实现,一个对照规范进行验证。
I made this case twice already, once as the code agent orchestra and once as adversarial code review. The reason it matters specifically inside a loop is the loop runs while you are not watching, so a verifier you actually trust is the only reason you can walk away. Subagents do burn more tokens since each one does its own model and tool work, so spend them where a second opinion is worth paying for. This is also basically what Claude Code’s /goal does under the hood, a fresh model decides if the loop is done instead of the one that did the work, the maker and checker split applied to the stop condition itself.
我之前两次阐述过这个论点:一次是“代码 agent 乐团”,一次是“对抗性代码审查”。这在循环中尤其重要,因为循环在你不在场时运行,所以一个你真正信赖的验证者是你能安心离开的唯一理由。子 agent 确实会消耗更多 token,因为每个 agent 都要独立运行模型和工具,所以应该只在第二意见值得付出成本时才使用。这基本上也是 Claude Code 的 /goal 在底层的实现方式:一个全新的模型判断循环是否完成,而不是执行工作的那个模型——将编写者与检查者分离的原则应用到了停止条件本身。
An automation runs every morning on the repo. Its prompt calls a triage skill that reads yesterdays CI failures, the open issues, the recent commits, and writes the findings into a markdown file or a Linear board. For each finding that is worth doing the thread opens an isolated worktree and sends a sub-agent to draft the fix, and a second sub-agent reviews that draft against the project skills and the existing tests. Connectors let the loop open the PR and update the ticket. Anything the loop can not handle lands in the triage inbox for me. The state file is the spine of the whole thing, it remembers what got tried, what passed, what is still open, so tomorrow morning the run picks up where today stopped. And look at what you actually did there. You designed it one time. You did not prompt any of those steps. Thats Steinberger’s whole point made real, and its the same loop in Codex or in Claude Code because the pieces are the same pieces.
一个自动化任务每天早上在仓库上运行。它的提示调用一个分类技能,读取昨天的 CI 失败、未解决的问题、最近的提交,并将结果写入一个 Markdown 文件或 Linear 看板。对于每个值得处理的问题,线程会打开一个独立的工作树,派一个子 agent 起草修复方案,再由第二个子 agent 对照项目技能和现有测试进行审查。连接器让循环能够打开 PR 并更新工单。任何循环无法处理的内容都会归入我的分类收件箱。状态文件是整个系统的脊柱,它记录已经尝试了什么、通过了什么、还有什么未完成,这样明天早上运行时可以从今天停下的地方继续。看看你实际做了什么:你只设计了一次。你没有手动执行任何这些步骤。这正是 Steinberger 的论点在实践中的体现,在 Codex 或 Claude Code 中都是同一个循环,因为组件是相同的。
The loop changes the work, it does not delete you from it. And three problems actually get sharper as the loop gets better, not easier. Verification is still on you. A loop running unattended is also a loop making mistakes unattended. The whole reason you split the verifier sub-agent from the maker is to make the loop’s “its done” mean something, and even then “done” is a claim and not a proof. I keep saying the same line from code review in the age of AI, your job is to ship code you confirmed works. Your understanding still rots if you allow it. The faster the loop ships code you did not write, the bigger the gap between what exists and what you actually get. Thats comprehension debt and a smooth loop just makes it grow faster unless you read what the loop made. And the comfortable posture is the dangerous one. When the loop runs itself its very tempting to stop having an opinion and just take whatever it gives back. I called that cognitive surrender. Designing the loop is the cure when you do it with judgement and the accelerant when you do it to avoid thinking, same action, opposite result.
循环改变了工作方式,但并不能让你置身事外。随着循环变得更好,三个问题实际上会更加尖锐,而不是更容易。验证仍然在你身上。无人值守的循环同样会无人值守地犯错。你之所以让验证子 agent 与编写者分离,就是为了让循环的“已完成”状态有意义,但即便如此,“完成”也只是一个声称,不是证明。我一直在重复“AI 时代的代码审查”中的那句话:你的工作是交付你确认能正常运行的代码。你的理解会退化,如果你放任它的话。循环交付你不曾写过的代码速度越快,现有代码与你实际理解之间的差距就越大。这就是“理解负债”,一个流畅的循环只会让它增长更快,除非你阅读循环产出的代码。舒适的姿态是危险的。当循环自行运行时,你很容易放弃判断,直接接受它给出的任何结果。我称之为“认知投降”。设计循环时,如果你带着判断力去设计,它就是解药;如果你是为了避免思考,它就成了催化剂。同样的行为,相反的结果。
I think this is a preview of how our work is going to evolve. That said, If I weren’t reviewing the code myself or if I relied entirely on automated loops to fix it my product’s quality would suffer. I’d likely end up stuck in a downward spiral, continuously digging myself into a deeper hole. That said, go ahead and set up your loops, but don’t forget that prompting your agents directly is also effective. It’s all about finding the right balance. Loops can also result in different outcomes depending on you. Two people can build the exact same loop and get completely opposite results. One uses it to move faster on work they understand. The other uses it to avoid understanding the work at all. The loop doesn’t know the difference. You do. That’s what makes loop design harder than prompt engineering, not easier. Cherny’s point isn’t that the work got easier. It’s that the leverage point moved. Build the loop. But build it like someone who intends to stay the engineer, not just the person who presses go.
我认为这是未来工作方式演变的预演。话说回来,如果我不亲自审查代码,或者完全依赖自动循环来修复问题,我的产品质量就会下降。我最终可能会陷入恶性循环,越挖越深。不过,请尽管设置你的循环,但别忘了直接提示 agent 也同样有效。关键在于找到正确的平衡。循环的结果也会因你而异。两个人可以构建完全相同的循环,却得到截然相反的结果。一个人用它来加速自己理解的工作,另一个人则用它来彻底避免理解工作。循环不知道其中的区别,但你知道。这正是循环设计比提示工程更难,而不是更容易的原因。Cherny 的观点不是说工作变容易了,而是说杠杆点转移了。构建循环吧,但要像打算继续做工程师的人那样去构建,而不只是那个按“开始”按钮的人。
| Primitive | Job in the loop | Codex app | Claude Code |
|---|---|---|---|
| Automations | discovery + triage on a schedule | Automations tab: pick project, prompt, cadence, environment; results land in a Triage inbox; /goal for run-until-done | Scheduled tasks and cron, /loop, /goal, hooks, GitHub Actions |
| Worktrees | isolate parallel features | Built-in worktree per thread | git worktree, --worktree, isolation: worktree on a subagent |
| Skills | codify project knowledge | Agent Skills (SKILL.md), invoked with $name or implicitly | Agent Skills (SKILL.md) |
| Plugins / connectors | connect your tools | Connectors (MCP) plus plugins for distribution | MCP servers plus plugins |
| Sub-agents | ideate and verify | Subagents defined as TOML in .codex/agents/ | Task subagents in .claude/agents/, agent teams |
| State | track what’s done | Markdown or Linear via a connector | Markdown (AGENTS.md, progress files) or Linear via MCP |
| 组件 | 循环中的职责 | Codex 应用 | Claude Code |
|---|---|---|---|
| 自动化 | 按计划发现与分类 | 自动化选项卡:选择项目、提示、节奏、环境;结果进入分类收件箱;/goal 运行至完成 | 计划任务和 cron,/loop,/goal,钩子,GitHub Actions |
| 工作树 | 隔离并行功能 | 每个线程内置工作树 | git worktree,--worktree,子 agent 上设置 isolation: worktree |
| 技能 | 固化项目知识 | Agent 技能(SKILL.md),通过 $name 调用或隐式触发 | Agent 技能(SKILL.md) |
| 插件/连接器 | 连接你的工具 | 连接器(MCP)加分发插件 | MCP 服务器加插件 |
| 子 agent | 构思与验证 | 在 .codex/agents/ 中以 TOML 定义子 agent | 在 .claude/agents/ 中定义任务子 agent、agent 团队 |
| 状态 | 跟踪已做事项 | 通过连接器写入 Markdown 或 Linear | Markdown(AGENTS.md、进度文件)或通过 MCP 写入 Linear |