Glean 拾遗
日刊 /2026-06-18 / 为 Agent 技能构建自我改进循环:内外部循环与云代理实战

为 Agent 技能构建自我改进循环:内外部循环与云代理实战

原文 x.com 收录 2026-06-18 06:00 阅读 4 min
AI 解读

本文展示了如何通过内外部 Agent 循环让 Skills 实现自我改进。内循环在每次新建 GitHub Issue 时通过 GitHub Action 触发云代理,运行分类技能并打标签。外循环每天运行一次,检查所有人工修正的标签和评论,自动生成 diff 更新技能文件,并合并回主分支。作者以 issue triage 为例,使用 Warp 的 Oz 云代理平台给出完整配置和代码示例,并提供了可复现的示例仓库。该方法适用于代码审查、Bug 修复、事件响应等场景。适合正在构建 AI Agent 并希望技能持续优化的工程师。

原文 4 分钟
原文 x.com ↗
§ 1

There’s been a lot of chatter about using “loops” lately to drive agents, and I think this has been accompanied by a bit of “what actually is a loop”?

I can’t speak for everyone else using the term, but I wanted to show a practical approach using Skills and cloud agents for a particularly powerful kind of loop: a self-improvement loop.

This is the idea that an agent can improve the quality of its own Skills over time from external feedback. My example is a loop that involves a human feedback step, but if you have a clear goal that doesn’t require a human, you can use the same method with an automated grader.

近来,“用循环驱动代理”的讨论热度很高,但同时也伴随着困惑——“循环到底是什么”?

我不能代表其他使用这个术语的人,但我想展示一个使用 Skills 和云代理的实用方法,实现一种特别强大的循环:自我改进循环。

核心理念是:代理可以根据外部反馈,随时间改进自身 Skills 的质量。我举的例子包含一个人工反馈步骤,但如果你有明确的目标、不需要人工参与,也可以用自动评分器实现相同的方法。

§ 2

To make matters concrete, say this Skill does issue triage, separating incoming issues into a few buckets: ready-to-implement, duplicate, needs-info. This would also work for a code review Skill, a bug fixing Skill, an incident response Skill, and so on.

Here’s what a first draft of the Skill might look like:

Full triage-issue Skill

具体来说,假设这个 Skill 负责 Issue 分类,将新提交的 Issue 归入几个类别:ready-to-implement、duplicate、needs-info。这个方法同样适用于代码审查 Skill、Bug 修复 Skill、事件响应 Skill 等。

下面是该 Skill 初稿的样子:

完整 triage-issue Skill

§ 3

What you need to do is set up the following loops:

  1. An inner agent loop: this is where you actually apply the Skill. For issue triage, you could be running it manually, or, more likely, you have an integration with your task tracker that runs the Skill whenever a new issue is filed. Interactions with the Skill are recorded somewhere: in a file, an agent trace, or an interaction in an external system like Slack or Github.
  2. An outer agent loop: this is an agent that runs on a schedule and observes the inner loop use of the Skill. For the issue triager, this will likely be a cloud agent that pulls records of every time the Triage agent ran. Its job is to look at all the runs of the inner agent and adjust its Skill based on the performance of those runs. Since Skills are just files, this means it should make a diff to improve Skill based on user feedback from past runs.

I’ll show you how to do this in practice using Warp and Oz, our cloud agent platform, but there are lots of ways you can accomplish it. We will use Github Issues as the issue tracker.

Here is a sample repo with the Skills and GitHub workflows to follow along.

你需要设置以下两层循环:

  1. 内层代理循环:这里是实际应用 Skill 的地方。对于 Issue 分类,你可以手动运行,更常见的是与任务跟踪系统集成,每当有新 Issue 提交时自动运行 Skill。与 Skill 的交互会被记录在某个地方:文件、代理追踪记录,或外部系统(如 Slack、Github)中的交互记录。
  2. 外层代理循环:这是一个按调度运行的代理,它观察内层循环对 Skill 的使用情况。对于 Issue 分类场景,它通常是一个云代理,拉取每次分类代理运行的记录。它的任务是检查内层代理的所有运行记录,并根据这些运行的表现调整 Skill。由于 Skills 就是文件,这意味着它应该生成一个 diff,基于用户反馈来改进 Skill。

我将用 Warp 和我们的云代理平台 Oz 来演示具体做法,但实现方式有很多种。这里使用 Github Issues 作为 Issue 跟踪器。

下面是包含 Skills 和 GitHub Workflows 的示例仓库,供你参考。

§ 4

The inner agent loop uses a Github action that runs on every new issue created.

Full GitHub Action

The Github action invokes a cloud agent through Oz, Warp’s cloud agent platform. This cloud agent syncs the repo, pulls in the issue contents from github, and tries to classify it. The code on how to set this up is in the repo linked below.

Now when a new issue comes in, a cloud agent runs the inner loop triaging skill, and applies a label indicating that a new feature request is ready to implement.

内层代理循环使用一个 GitHub Action,每次创建新 Issue 时触发。

完整 GitHub Action

该 GitHub Action 通过 Warp 的云代理平台 Oz 调用一个云代理。云代理同步仓库、从 GitHub 拉取 Issue 内容,并尝试分类。相关设置代码在下方链接的仓库中。

当新 Issue 进来时,云代理运行内层循环的分类 Skill,并打上标签,标明这是一个可以开始实现的特性请求。

§ 5

Let’s say though that a human reviewer doesn’t agree with the agent assignment. As a person looking at the agent’s assigned labels, I switch the issue from “ready to implement” to “needs info” and add a comment on the thread as to why it was mis-categorized, e.g. because there is ambiguity on whether we should add a setting for the new feature.

Here’s where the outer loop becomes interesting. The outer loop agent runs once a day and looks at all issues that have been triaged, and when it runs, it will find that I manually adjusted the label and gave a reason why.

Full improve-triage-issue Skill

Since the outer loop agent Skill is run through a coding agent, it will take the feedback I provided and make a diff to update the triage Skill.

Once that diff merges, it feeds back into Skill that drives the inner loop agent, and the next time the agent runs the Skill should work better.

Would love to know if this is useful for folks. We use self improvement loops to manage the Warp open-source repository, and we extracted the framework behind it for others to adopt. Early version here.

假设人工审查者不同意代理的分类结果。我看到代理打的标签后,把 Issue 从“ready to implement”改为“needs info”,并在讨论串中说明分类错误的原因,例如因为新特性是否应该添加一个设置项存在歧义。

外层循环的妙处就在这里。外层代理每天运行一次,检视所有已分类的 Issue。运行时它会发现我手动调整了标签并给出了理由。

完整 improve-triage-issue Skill

由于外层代理的 Skill 是通过编码代理运行的,它会拿我提供的反馈,生成一个 diff 来更新分类 Skill。

这个 diff 合并后,它会反馈到驱动内层代理的 Skill 中。下次代理运行时,Skill 应该会表现更好。

我很想知道这对大家是否有帮助。我们使用自我改进循环来管理 Warp 开源仓库,并提取了背后的框架供其他人采用。早期版本在这里。

打开原文 ↗