AI 代理的编排税:为什么开 20 个 agent 不等于 20 倍产出
Addy Osmani 提出「编排税」概念:启动 AI 代理很便宜,但验证、合并、做判断的环节是串行的——你的认知带宽无法并行化。他用 Amdahl 定律和 Python GIL 做类比:你就是系统中的单线程瓶颈,代理再多也只能排队等待你的判断。文章给出了具体应对策略:按 review rate 限制并行数、把任务分成「可异步委托」和「复杂判断」两类、批量做代码审查、让代理自证而非靠你验证。适合已经在日常使用 AI 代理、感到「忙但不出活」的一线工程师。
By @addyosmani · 2026-05-28T03:48:32.000Z

Starting more AI agents is easy now. However, more agents running doesn’t mean there's more of you available - your cognitive bandwidth doesn’t parallelize. All the judgement to actually steer them and merge the changes they make still has to route through exactly one serial processor which is just you. Orchestration tax is basically the price you pay for forgetting this and the only real fix is to start architecting your own attention like you architect any concurrent system.
I was on a panel at Google I/O with Richard Seroter, Aja Hammerly and Ciera Jaspan talking about what software engineering looks like right now and how it will probably evolve. Near the end Richard asked us what is one thing developers should walk away and do differently. I said the thing I been circling around for months: feeling busy is definitely not the same as being productive. You can run 20 agents and feel completely busy. But thats not 20 agents worth of shipped work.
Earlier in that chat Richard gave this problem a name. "You talked about the orchestration tax" he said. "You can't manage twenty agents successfully in your own brain." He is totally right. I want to breakdown this idea properly because its not a discipline problem. It is an architecture problem.
The line from the panel I keep thinking about is something I said almost randomly: running multiple agents does not mean there is more of you.
作者:@addyosmani · 2026-05-28T03:48:32.000Z

现在启动更多的 AI 代理很容易。但运行更多代理并不代表你这个人就变多了——你的认知带宽无法并行化。所有导航决策和合并修改的判断,最终仍然流经唯一的一个串行处理器:你本人。编排税就是你忘记这一点所付出的代价,唯一的解决办法是像架构任何并发系统一样,去架构你自己的注意力。
我在 Google I/O 上参加了一场讨论,同台的有 Richard Seroter、Aja Hammerly 和 Ciera Jaspan,聊的是软件工程的现状和可能的演变方向。临近结束时,Richard 问我们,开发者最应该带走并做出改变的一件事是什么。我说了我这几个月一直在绕圈子的话:感觉忙碌和真正高效绝对不是一回事。你可以跑 20 个代理并感觉忙得不行,但那只代表 20 个代理在跑,不代表产生了 20 个代理价值的上线交付物。
在那次讨论更早的时候,Richard 给这个问题起了个名字。他说:“你提到了编排税。你不可能在自己的大脑里同时成功地管理二十个代理。”他说得完全对。我想好好拆解一下这个想法,因为它不是纪律问题,而是架构问题。
那天讨论后我反复想起的一句话,是我几乎随口说的:“同时跑多个代理,不代表你就变多了。”
The asymmetry people don’t price in
There is this hidden asymmetry in agentic workflows. Starting an agent is very cheap. It is just a keystroke or a sentence prompt. But closing the loop on the agent is not cheap at all. Someone has to check if what came back is correct and reconcile it with whatever the other agents touched. That someone is you. And there is exactly one of you.
I wrote about a piece of this last month in Your parallel Agent limit, mostly about the ambient anxiety of not knowing which paralell thread is quietly failing. This post is about the shape underneath that cost. When you start seeing agent development as a concurrent system, you realize the human is just a component inside it. The slow serial component.
人们没有算进去的隐藏不对称
在代理工作流中藏着一个不对称:启动一个代理非常廉价,无非是一次按键或一句话的 prompt。但关闭这个代理的循环一点也不廉价。必须有人检查返回的结果是否正确,并把它与其它代理碰过的东西做协调。这个人就是你。而且你只有一个。
上个月我在《你的并行代理极限》一文中写过这个问题的一部分,主要说的是不知道哪个并行线程在无声地失败所带来的弥漫焦虑。这篇 post 讲的是成本之下的结构形状。当你开始把代理开发看作是并发系统时,你会发现人只是系统中的一个组件——那个缓慢的串行组件。
You are the single thread resource
If you ever wrote concurrent code you already have the right intuition. You just been pointing it at the wrong part of the system.
Python has the Global Interpreter Lock (GIL). You can spawn as many threads as you want but only one executes python bytecode at a time because they must acquire the lock. You are the GIL of your AI agents. They all can run at once. But when any of their work needs genuine understanding of the architecture or resolving merge conflicts, that work has to acquire the lock. There is one lock. You hold it.
Amdahl’s Law makes this very precise. The speedup you get from parallelizing is capped by the fraction of work that stays serial. If a big chunk of your pipeline cant be parallelized, you top out at a hard limit no matter how many cores you throw at it. In agent development the serial fraction is the judgement. Spawning 8 agents doesn’t speed up your judgement time. It just makes the queue of things feeding into it much deeper.
This is an old performance engineering fact that still surprise people: optimizing the non bottleneck part doesn’t increase throughput. You just grow the pile of unfinished work sitting in front of the bottleneck. Adding agents optimize the part that was never the constraint. The constraint is the review step and the throughput of your system equals exactly the throughput of that step. The orchestration tax is the structural gap between agent production and what you can actually merge. It’s what happens when you put a single-threaded resource in charge of a concurrent one.
你就是单线程资源
如果你写过并发代码,你已经有了正确的直觉——只不过你一直把直觉用在了系统里错误的部分。
Python 有全局解释器锁(GIL)。你可以想创建多少线程就创建多少,但同一时间只有一个线程在执行 Python 字节码,因为线程必须获取锁。你就是你的 AI 代理的 GIL。所有代理可以同时运行,但只要它们的工作需要真正理解架构或解决合并冲突,那部分工作就必须获取锁。锁只有一个,握在你手里。
阿姆达尔定律把这个现象说得很精确:并行化能得到的加速比,被必须串行的工作比例所限制。如果你的流程中有一大块无法并行化,那么无论你扔多少核进去,都撞上了一个硬上限。在代理开发中,串行的那部分就是“判断”。创建 8 个代理并不会加快你的判断速度,它只会让等待判断的队列变得更深。
这是一个老旧但依然让人惊讶的性能工程事实:优化非瓶颈的部分不会提高吞吐量,你只是让堆积在瓶颈前的未完成工作变多而已。增加代理优化的是从来就不是约束的那个部分。约束是 review 步骤,你系统的吞吐量恰好等于那个步骤的吞吐量。编排税,就是代理产出和你实际能合并之间的结构差距——当你让一个单线程资源负责一个并发系统时,它必然发生。
Grinding won't fix structural limits
At the panel I said I never felt more productive with my tools but I am also more tired than I ever been. Both halves are completely real and they have the same cause.
The tiredness has a very specific cause. It is how running a serial processor at 100% with no slack feels like. Everytime you check on an agent you been away from you pay a context switch cost. You flush your brain and reload a different context from cold. CPUs do this in microseconds and architects still work hard to avoid it. You do it in minutes and you never reload the context perfectly. Five agents is not 1x workload done five times. It is 5 cold reloads plus a background brain process constantly worrying about which agent you should be checking.
You can’t just try harder to fix a structural limit. The tax will be paid anyway. If you try to grind it out, the limit just shows up as shallow code reviews or experiencing cognitive surrender where you just accept the agent’s code because forming your own opinion costs attention you don’t have anymore. You either pay the tax deliberately or you let it quietly destroy your understanding of your own system.
死磕修复不了结构限制
在那场讨论中我说,我的工具从来没让我感觉这么高效过,但我也从未这么累过。这两半都完全真实,而且有着同一个原因。
这种累有一个非常具体的原因:它就是一个串行处理器以 100% 利用率、毫无余裕运转时的感受。每次你去看一个之前离开的代理,都要付出一次上下文切换的代价。你把大脑清空,从冷启动重新加载另一个上下文。CPU 做这件事只需微秒,而且架构师们仍然努力在避免它。你以分钟为单位来做,还永远做不到完美重载。五个代理不是 1 倍工作量乘以 5,而是 5 次冷启动重载,外加一个后台大脑进程在不断焦虑下一个该检查哪个代理。
你不能靠更努力去修复一个结构限制。税反正都要付。如果你死磕,限制只是换了一种形式出现:浅尝辄止的代码审查,或者经历认知投降——你接受代理写的代码,因为形成自己的意见所需的注意力你已经没有了。你要么主动缴纳编排税,要么让它无声地摧毁你对自身系统的理解。
Architect your attention
So you have to treat your attention as the scarce serial resource it is. You wouldn’t design a distributed system without thinking hard about the bottleneck. Give your brain the same respect.
Some things that actually held up for me:
Scale fleet to review rate, not the UI. A good concurrent system uses backpressure so the queue doesn’t grow infinitely. The producer slows down to match the consumer. Your agent count is the producer and your review rate is the consumer. The right number of parallel agents is how many you can actually code review properly. For most of us this is a low single digit. The AI tool will happily let you spawn 20 but that is just a UI feature.
Sort the work. I mentioned this to Richard when he asked how I navigate it. I keep two piles of tasks. One is isolated work that I’m happy to delegate to background agents running in the Cloud. These can run async and often just need me at the final gate. The other pile is complex tasks where the judgement is the work. Like a weird bug or architecture design. The big mistake is trying to paralellize the second pile. Doing multiple complex tasks doesn’t scale your output. It just thrashes the lock and everything comes out worse.
Batch your reviews. Context switching cost you heavily everytime you do it. Reviewing 4 agents at the same time in one sitting is much cheaper than checking one, leaving to do something else and returning cold. Give agents a long leash. Let the work pile up a bit and process the batch.
Only spend the lock on judgement. Dont waste your brain on things the machine can verify itself. Make the agent write a passing test or generate a screenshot. They can prove the boring 80% themselves so you only spend your scarce attention on the 20% that genuinely needs a human.
Protect your serial time. The bottleneck needs your best hours, not the leftover minutes between agent check-ins. Sometimes the highest leverage move is to stop orchestrating entirely, close the laptop full of agents and just think hard about one single problem with the lock held the whole time. Orchestrating is not the real work. Its the overhead around the work.
Aja pointed out that architecture is the urgent skill now. Knowing what belongs inside one agent and what is too much for it. I would add that you are a component in that system. Your attention has a known, low serial throughput. The system either respects that number or it routes around it by secretly lowering your standards.
架构你的注意力
所以你必须像对待稀缺的串行资源那样对待你的注意力。你不会在没仔细考虑瓶颈的情况下就设计一个分布式系统。给你的大脑同样的尊重。
一些对我确实有效的做法:
根据审查率而非 UI 来扩展代理数量。 一个好的并发系统使用背压机制,让队列不会无限增长——生产者会降速以匹配消费者。你的代理数量是生产者,你的 review 速率是消费者。正确的并行代理数量,是你实际上能好好完成代码审查的数量。对大多数人来说,这是个位数。AI 工具乐你创建 20 个,但那只是 UI 特性。
对任务做分类。 我在 Richard 问我如何应对时提到了这一点。我把任务分成两堆。一堆是孤立的工作,我很乐意交给在云端跑的后台代理去做;这些任务可以异步执行,通常只需要我在最后关口看一眼。另一堆是复杂任务,判断本身就是工作——比如一个奇怪的 bug 或架构设计。最大的错误就是试图去并行化第二堆任务。做多个复杂任务不会让你的产出成倍增长,它只会激烈争抢锁,最终什么都搞砸了。
批量审查。 每次切换上下文的代价都很高。一次性坐下来审查 4 个代理的结果,比检查一个、离开去做别的事、再冷启动回来要划算得多。给代理长一点的绳,让工作积攒一些,然后批量处理。
只把锁用在判断上。 不要把大脑浪费在机器自己就能验证的事情上。让代理写一个通过的测试或生成截图。它们能自己证明那无聊的 80% ,这样你就只需把稀缺的注意力花在真正需要人类的 20% 上。
保护你的串行时间。 瓶颈需要你最好的时间,而不是代理检查间隙的零碎分钟。有时候最高杠杆的做法是完全停止编排,合上满是代理的笔记本,全程把锁握住,只深入思考一个单一的问题。编排不是真正的工作,它是工作周围的开销。
Aja 指出,架构现在是紧迫的技能——知道哪些事属于单个代理,哪些事对单个代理来说太多了。我想补充的是:你也是那个系统里的一个组件。你的注意力有一个已知的、很低串行吞吐量。系统要么尊重这个数字,要么通过悄悄降低你的标准来绕过它。
Busy vs Productive
This is really important because the failure mode is invisible to you. Twenty running agents gives you this feeling of massive productivity. The dashboard is full and everything moves. But that feeling is decoupled from actually shipping good code to main. You can be maximally busy and barely produce anything. From the inside it feels identical.
Ciera pointed out Margaret-Anne Storey’s work on debt. We talked about technical debt and cognitive debt. The orchestration tax left unpaid is how you accumulate both at once. You merge stuff you didn’t read well. Your mental model of the codebase goes completely stale. None of this shows up on the dashboard today. It shows up when production breaks and you look at the system and realize you have no idea how it works anymore.
So this is the actual takeaway. Spawning agents is not the real skill. Anyone can run 20.
The real skill is designing the system around the one serial resource that cannot be cloned or parallelized. That resource is your attention.
Architect it the way you architect anything else you depend on in production.
忙碌 vs 高效
这非常重要,因为这个失败模式对你来说是看不见的。跑着二十个代理会给你一种产出巨大的感觉:面板满满当当,一切都在运转。但那种感觉与实际把好代码交付到主分支是脱节的。你可以极其忙碌却几乎没有任何产出。从内部来看,它们的感觉一模一样。
Ciera 提到了 Margaret-Anne Storey 关于债务的研究。我们聊到了技术债务和认知债务。未偿付的编排税,就是你一次性同时积累两种债务的方式。你合并了你没仔细读的代码,你对代码库的心智模型变得完全过时。这些都不会在今天的面板上显示出来。它们在生产环境出问题、你看着系统却发现自己完全不知道它怎么工作的时候才会现身。
所以这才是真正的要点:创建代理不是真正的技能,任何人都能跑 20 个。
真正的技能是围绕那一个无法被克隆或并行化的串行资源来设计系统。那个资源就是你的注意力。
像架构生产中的其他一切依赖那样去架构你的注意力吧。