Glean 拾遗
日刊 /2026-06-21 / 薄封装,厚技能:用五个概念构建自进化的 AI 代理系统

薄封装,厚技能:用五个概念构建自进化的 AI 代理系统

原文 x.com 收录 2026-06-21 06:00 阅读 12 min
AI 解读

YC 合伙人 Garry Tan 提出一套区别于“更好模型”的 AI 代理架构理念:代理的瓶颈不是模型智能,而是对上下文和过程的管理。他用五个核心概念来定义这一架构:Skill files(以 Markdown 写成的可复用过程文件)、Harness(精瘦的运行与上下文管理循环)、Resolvers(上下文加载路由表)、Latent vs. deterministic(智能与确定性工作的严格边界划分)以及 Diarization(从非结构化信息中提取结构化的分析简报)。这套理念的实战体现是 YC Startup School 的匹配系统——用同一套技能文件的不同参数调用,完成分组、午餐配桌和实时匹配,并能通过分析“还行”的反馈自动重写技能规则,实现系统自进化。文章为一线工程和产品团队提供了具体可操作的代理设计原则,尤其适合正在构建 AI 工作流、需要平衡模型能力与系统可靠性的工程师。

原文 12 分钟
原文 x.com ↗
§ 1

Steve Yegge says people using AI coding agents are "10x to 100x as productive as engineers using Cursor and chat today, and roughly 1000x as productive as Googlers were back in 2005."

That's a real number. I've seen it. I've lived it. But when people hear it, they reach for the wrong explanation. Better models. Smarter Claude. More parameters. The 2x people and the 100x people are using the same models. The difference isn't intelligence. It's architecture — and it fits on an index card.

Steve Yegge 认为,使用 AI 编码代理的人如今比用 Cursor 和 chat 的工程师生产力高出 10 到 100 倍,比 2005 年的 Google 员工高出约 1000 倍。

这个数字是真实的。我亲眼见过,也亲身经历过。但人们听到后,往往会归结为错误的原因:更好的模型、更智能的 Claude、更多参数。实际上,2 倍效率和 100 倍效率的人用的是同一个模型。差异不在于智能,而在于架构——而且这个架构简单到可以写在一张索引卡上。

§ 2

On March 31, 2026, Anthropic accidentally shipped the entire source code for Claude Code to the npm registry. 512,000 lines. I read it. It confirmed everything I'd been teaching at YC: the secret isn't the model. It's the thing wrapping the model.

Live repo context. Prompt caching. Purpose-built tools. Context bloat minimization. Structured session memory. Parallel sub-agents. None of that makes the model smarter. All of it gives the model the right context, at the right time, without drowning it in noise.

That wrapper is called the harness. And the question every AI builder should be asking is: what goes in the harness, and what stays out? The answer has a specific shape. I call it thin harness, fat skills.

2026 年 3 月 31 日,Anthropic 意外将 Claude Code 的完整源代码发布到了 npm 仓库。51.2 万行。我读完了。这证实了我在 YC 一直讲授的内容:秘密不在于模型,而在于包裹模型的封装器。

实时仓库上下文、提示缓存、专用工具、上下文膨胀最小化、结构化会话记忆、并行子代理——这些都不会让模型变得更智能,但能让模型在正确的时间获得正确的上下文,而不被噪音淹没。

这个封装器叫做“harness(绑带/框架)”。每个 AI 构建者都应该问的问题是:harness 里该放什么,不该放什么?答案有一个明确的形状。我称之为“瘦绑带,厚技能”(thin harness, fat skills)。

§ 3

The bottleneck is never the model's intelligence. Models already know how to reason, synthesize, and write code. They fail because they don't understand your data — your schema, your conventions, the particular shape of your problem. Five definitions fix this.

  1. Skill files

A skill file is a reusable markdown document that teaches the model how to do something. Not what to do — the user supplies that. The skill supplies the process.

Here's the key insight most people miss: a skill file works like a method call. It takes parameters. You invoke it with different arguments. The same procedure produces radically different capabilities depending on what you pass in.

Consider a skill called /investigate. It has seven steps: scope the dataset, build a timeline, diarize every document, synthesize, argue both sides, cite sources. It takes three parameters: TARGET, QUESTION, and DATASET. Point it at a safety scientist and 2.1 million discovery emails, and you get a medical research analyst determining whether a whistleblower was silenced. Point it at a shell company and FEC filings, and you get a forensic investigator tracing coordinated campaign donations.

Same skill. Same seven steps. Same markdown file. The skill describes a process of judgment. The invocation supplies the world.

This is not prompt engineering. This is software design, using markdown as the programming language and human judgment as the runtime. Markdown is, in fact, a more perfect encapsulation of capability than rigid source code, because it describes process, judgment, and context in the language the model already thinks in.

瓶颈从来不是模型的智能。模型已经知道如何推理、综合和写代码。它们之所以失败,是因为它们不了解你的数据——你的 schema、你的惯例、你问题的特定形态。五个定义解决了这个问题。

  1. 技能文件(Skill files)

技能文件是一个可复用的 markdown 文档,它教模型如何做某件事。不是做什么——那由用户提供。技能提供的是过程。

大多数人忽略的关键洞察是:技能文件就像一次方法调用。它接受参数。你可以用不同的实参来调用它。同一个过程,根据传入的不同参数,催生出完全不同的能力。

举个例子:名为 /investigate 的技能包含七个步骤:界定数据集范围、构建时间线、对每份文档做 diarization(结构化摘要)、综合、论证正反两面、引用来源。它接受三个参数:TARGET、QUESTION 和 DATASET。把它指向一名安全科学家和 210 万封取证邮件,你就得到了一个判断举报人是否被压制的医疗研究分析师。把它指向一家空壳公司和 FEC 申报文件,你就得到了一个追踪协同政治献金的取证调查员。

同一技能,同样的七个步骤,同一个 markdown 文件。技能描述判断过程,调用提供世界。

这不是提示工程(prompt engineering)。这是软件设计,用 markdown 作为编程语言,用人的判断作为运行时。事实上,markdown 比僵硬的源代码更能完美封装能力,因为它用模型已经在思考的语言来描述过程、判断和上下文。

§ 4
  1. The harness

The harness is the program that runs the LLM. It does four things: runs the model in a loop, reads and writes your files, manages context, and enforces safety. That's it. That's the "thin."

The anti-pattern is a fat harness with thin skills. You've seen it: 40+ tool definitions eating half the context window. God-tools with 2-to-5-second MCP round-trips. REST API wrappers that turn every endpoint into a separate tool. Three times the tokens, three times the latency, three times the failure rate.

What you want instead is purpose-built tooling that's fast and narrow. A Playwright CLI that does each browser operation in 100 milliseconds, not a Chrome MCP that takes 15 seconds for screenshot-find-click-wait-read. That's 75x faster. Software doesn't have to be precious anymore. Build exactly what you need, and nothing else.

  1. 绑带(Harness)

绑带是运行 LLM 的程序。它做四件事:循环运行模型、读写你的文件、管理上下文、执行安全约束。仅此而已。这就是“薄/瘦”(thin)。

反模式是厚绑带搭配薄技能。你肯定见过:40 多个工具定义吃掉一半的上下文窗口;带有 2 到 5 秒 MCP 往返的“上帝工具”;把每个端点变成独立工具的 REST API 包装器。三倍的 token,三倍的延迟,三倍的失败率。

你真正需要的是快速、专注的目的导向工具。一个 Playwright CLI,每个浏览器操作只需 100 毫秒,而不是一个花费 15 秒做截图-查找-点击-等待-读取的 Chrome MCP。快了 75 倍。软件不再需要那么金贵。只构建你恰好需要的,别的不要。

§ 5
  1. Resolvers

A resolver is a routing table for context. When task type X appears, load document Y first.

Skills tell the model how. Resolvers tell it what to load and when. A developer changes a prompt. Without the resolver, they ship it. With the resolver, the model reads docs/EVALS.md first — which says: run the eval suite, compare scores, if accuracy drops more than 2%, revert and investigate. The developer didn't know the eval suite existed. The resolver loaded the right context at the right moment.

Claude Code has a built-in resolver. Every skill has a description field, and the model matches user intent to skill descriptions automatically. You never have to remember that /ship exists. The description is the resolver.

A confession: my CLAUDE.md was 20,000 lines. Every quirk, every pattern, every lesson I'd ever encountered. Completely ridiculous. The model's attention degraded. Claude Code literally told me to cut it back. The fix was about 200 lines — just pointers to documents. The resolver loads the right one when it matters. Twenty thousand lines of knowledge, accessible on demand, without polluting the context window.

  1. 解析器(Resolvers)

解析器是上下文的路由表。当任务类型 X 出现时,先加载文档 Y。

技能告诉模型如何做,解析器告诉模型何时加载什么。一个开发者修改了提示词。没有解析器的话,他直接就发布了。有了解析器,模型会先读取 docs/EVALS.md——里面写着:运行评估套件,比较分数,如果准确率下降超过 2%,就回滚并调查。开发者根本不知道有评估套件存在,但解析器在正确时刻加载了正确的上下文。

Claude Code 内置了一个解析器。每个技能都有一个 description 字段,模型会自动将用户意图匹配到技能描述。你再也不用记住 /ship 存在。描述本身就是解析器。

坦诚地说:我的 CLAUDE.md 曾经有 2 万行。每个古怪用法、每个模式、每个教训全都塞进去了。这完全荒谬。模型的注意力下降了。Claude Code 直接建议我删减。修复后只剩大约 200 行——只是指向文档的指针。解析器会在需要时加载正确的文档。2 万行的知识可按需访问,而不会污染上下文窗口。

打开原文 ↗