Daily /2026-06-21 / Stop building Foxconn factories for your agents

Stop building Foxconn factories for your agents

Source x.com Glean’d 2026-06-21 06:00 Read 14 min

AI summary

Garry Tan reflects on his experience building a 540,000-line Rails app, using the Foxconn factory as a metaphor for the dominant AI agent development pattern: wrapping hyper-intelligent models in mountains of code, tests, and guardrails. He argues the economics have inverted—model calls are now cheap and the models are smarter, making the old instinct to ration and control them obsolete. The new paradigm is 'just-in-time software' and 'skill packs,' where lean markdown instructions and minimal TypeScript replace bloated engineering frameworks. A concrete example shows a hackathon judge agent built in an afternoon, doing what previously required a full software project. The essay challenges engineers to abandon the 2013 mental model of measuring capability by lines of code and to embrace 'tokenmaxxing' to gain a 2-3 year competitive advantage. It is aimed at engineers who are coding with AI but still trapped by traditional software metrics and mistrustful architectures.

Original · 14 min

x.com ↗

§ 1

In January I got back into coding and I built Garry's List. Over five hundred thousand lines of Rails and the tests to police it.

I was proud of it. I shouldn't have been. The thing worth being proud of wasn't the app. It was the setup that came out of building it. GStack, the way I code with agents, grew out of the work of building Garry's List, and I gave it away. It's one of the hundred most-starred open source projects in GitHub history, about 105,000 stars in under three months. The half-million lines were the product. The setup was the byproduct. The byproduct is the part that mattered.

今年一月我重新开始写代码，搭建了 Garry's List。五十多万行 Rails 代码和配套的测试体系。

我当时很自豪。但我不该那样。真正值得骄傲的不是应用本身，而是构建过程中诞生的那套体系。GStack——我借助 AI agent 写代码的方式——正是在构建 Garry's List 的过程中长出来的。我把它开源了。它成了 GitHub 历史上百个最受星标的项目之一，不到三个月就收获约 10.5 万颗星。那五十万行代码是产品，那套体系是副产品。副产品才是真正重要的部分。

§ 2

Here is what 540,000 lines of code wrapped around an LLM actually is.

It is a Foxconn factory. Built for an hyper-intelligent AI worker who doesn't need hyper-vigilance. We built it anyway.

Little booties at the door. Up at 6am. Calisthenics. A life so hard you have to erect netting around high floors of every building, because... well, it's not a life you want to live. The same line of the assembly belt forever. Every test, every guardrail, every retry loop, an inch of cage bolted onto a worker who can already do the job and a thousand things you didn't ask for.

Humans and agents both contain multitudes but Foxconn factories are built to squeeze intelligence and work out of beautiful beings that could do all that work and 1000x more if we let them.

I built the factory. Everyone builds these today. I'm telling you not to.

五十四万行代码包裹着一个大语言模型，它实际上是什么？

它是一座富士康工厂。为一个不需要如此严苛监管的超智能 AI 工人而建。我们却还是建了。

进门穿鞋套，早上六点开工，做早操。一种苦到你不得不在每栋高楼的高层加装防护网的生活——因为，好吧，这不是你想过的日子。永远的流水线。每一个测试、每一道护栏、每一次重试循环，都是在把一个本就能胜任工作、还能做上千件你没要求的事情的工人，多拧上一寸的笼子。

人和 agent 都蕴含多重可能，但富士康工厂的存在，就是要从那些本可以完成所有工作、甚至一千倍更多的美丽生命身上，榨取智力和劳动。

我建了这座工厂。如今每个人都在建这种工厂。而我告诉你：不要。

§ 3

What I actually did with my 539k LOC written was prove I could perfectly impersonate a time traveler. A 2013 Web 2.0 engineer (me, the last time I was a true software engineer) dropped into 2026 with modern tools, building the only way he knew how. More code. Always more code. The tools had changed. My instincts hadn't.

The 2013 engineer believes one thing in his bones: capability equals lines of code. That belief was correct for decades, until now. Hand me Codex or Claude Code and I'll do the work of 100 to 1000 engineers. Same map, faster engine, fastest possible route to the what is now the wrong place.

This is where almost everyone building with AI is right now. They upgraded the tool and kept the 2013 mental model. The trap doesn't feel like a trap, because the code works. Garry's List shipped. It felt like the most productive month of my life.

It was productivity in the service of an obsolete idea.

我用五十三万九千行代码真正证明的，是我能完美扮演一个时间旅行者。一个 2013 年的 Web 2.0 工程师（那是我上一次真正做软件工程师的时候）带着现代工具掉进 2026 年，只按他唯一会的方式去构建：更多代码。永远更多代码。工具变了，但我的直觉没变。

2013 年的工程师骨子里相信一件事：能力等于代码行数。这个信条几十年都正确，直到现在。把 Codex 或 Claude Code 交给我，我就能干一百到一千个工程师的活。同样的地图，更快的引擎，以最快的路线抵达现在错误的终点。

这正是大多数使用 AI 构建的人当前的处境。他们升级了工具，保留了 2013 年的思维模式。这个陷阱感觉不像陷阱，因为代码能正常工作。Garry's List 成功发布了。那感觉像是我这辈子最高产的一个月。

但那高产是为一个过时的理念服务的。

§ 4

The old economics for many years through 2025: LLM calls were expensive and code was cheap. So you wrote code to ration the model, to harness it, to call it carefully and sparingly. The architecture was lots of software wrapped protectively around a few precious model calls.

Both halves of that equation have flipped.

The model is now becoming cheap and getting cheaper every quarter, and it's so smart that the value-cost ratio flipped. And the model can write usable code. So you stop writing code to babysit the model. You can now instruct the model in plain language, and you let it write the minimal code actually needed.

This is just-in-time-software, and we're entering the golden age of it.

The artifact changes shape entirely. The Rails app was 540,000 lines I wrote and own, code plus the tests built to police it. The replacement is an agent built on markdown and code, a fraction of that. Same capability. Easier to read. Easier to maintain. Far more flexible, because the behavior lives in instructions you can edit in plain language instead of logic frozen in code the day you wrote it.

We were writing code to babysit a thing that is now smarter than the code.

直到 2025 年，持续多年的旧经济学是：大语言模型调用很贵，代码很便宜。所以你写代码是为了配给模型、约束模型、谨慎而吝啬地调用它。架构是大量软件像保护壳一样包裹着少数几次珍贵的模型调用。

这个等式的两边都已经反转。

模型正在变得便宜，并且每个季度都在变得更便宜，同时它足够聪明，以至于价值成本比也反转了。而且模型能写出可用的代码。所以你不必再写代码来照看模型。你现在可以用自然语言指令模型，让它去写实际所需的最小代码量。

这就是即时软件，我们正进入它的黄金时代。

产物本身的形态完全变了。Rails 应用是五十四万行我亲手写并拥有的代码，外加用来监管它的测试。替代品是一个基于 markdown 和代码构建的 agent，体量只有前者的零头。相同的能力。更易读。更易维护。灵活得多，因为行为存在于你可以用自然语言编辑的指令中，而非凝固在你写下那天的代码逻辑里。

我们一直在写代码去照看一个现在已经比代码更聪明的东西。

§ 5

If you've been coding lately, you probably are building this kind of factory without knowing it. Walk your own codebase and count the lines that exist only because you didn't trust the model to do its job.

Mine: about 262,000 lines of application code, and about 276,000 lines of tests bolted on to police it. The audit committee was bigger than the company. Sanitizers checking inputs the model would have handled. Validators checking outputs the model would have caught. Retry loops wrapping calls the model recovers from on its own. Every one of those lines is a bet that the worker will fail. You wrote the same bets. We all did.

127 background jobs, 33 of them on cron. That is not capability. That is 33 alarms set for an LLM worker who usually these days shows up on time.

In my Foxconn factory building days, Claude and I wrote a 1,778-line file whose only job is to second-guess the model's facts. It takes every claim the model makes, fans each one out to five separate sources in parallel, and grades them. A triage gate so the easy claims skip the full blast. A retry if the first pass comes back empty. Fallbacks for the fallbacks.

There's an episode of Rick and Morty where Rick builds a little robot at the breakfast table. It powers on, looks up, and asks what its purpose is. Rick says, "You pass butter." The robot slides the butter dish across the table, looks down at its own hands, and says, "Oh my god." Then it just sits there. That robot contains multitudes. It was built to pass butter. My 276,000 lines of tests were the butter dish.

When you build this kind of software, in the 2023 Foxconn factory way, you built a cage, and if you're not careful, you'll be the jailer maintaining the prison for your AI agents.

如果你最近在写代码，你很可能正在不知不觉中建造这类工厂。去审视一下你自己的代码库，数一数那些仅仅因为你不信任模型能做好工作而存在的代码行。

我的情况：约二十六万二千行应用代码，外加约二十七万六千行强行绑上去监管它的测试。审计委员会比公司还大。清理器检查那些模型本来就能处理的输入。验证器检查那些模型本来就能发现的输出。重试循环包裹着那些模型自己能恢复的调用。每一行代码都是对工人会失败的赌注。你写过同样的赌注。我们都写过。

127 个后台任务，其中 33 个是 cron 任务。那不是能力。那是为一个如今通常准时到岗的 LLM 工人设置的 33 个闹钟。

在我建造富士康工厂的日子里，Claude 和我写了一个 1,778 行的文件，它的唯一工作就是质疑模型的事实。它拿到模型做的每个断言，将每个断言并行展开到五个独立来源，然后评分。一个分诊门，让简单的断言跳过全面核查。如果第一次核查为空就重试。后备方案的后备方案。

《瑞克和莫蒂》有一集，瑞克在早餐桌上造了一个小机器人。它开机，抬头，问自己的用途是什么。瑞克说：“你递黄油。”机器人把黄油碟滑过桌子，低头看着自己的手，说：“哦我的天啊。”然后它就呆坐在那里。那个机器人蕴含多重能力。它被造出来就是为了递黄油。我那二十七万六千行测试就是那个黄油碟。

当你以 2023 年富士康工厂的方式构建这类软件时，你建了一个笼子。如果你不小心，你就会变成那个为你 AI agent 维护监狱的狱卒。

§ 6

When I say markdown, I do not mean prompting. Prompting is ephemeral. You type something, you get something, it evaporates.

This is building. Versioned, tested, reusable.

The markdown is the instruction layer: the intent, the skill, the judgment about how the work should be done. The TypeScript is the thin deterministic layer. The few things that genuinely have to be code, the I/O, the parts that must never hallucinate.

And critically, you test the markdown the way you'd test code. In my setup the loop is one word. I build something with the agent until it works, then I say "skillify it." The agent then writes:

the markdown skill
the minimal code it needs
a unit test for the code
an LLM eval for the skill
an integration test across both
a resolver so the agent invokes the skill automatically when it's relevant
and an eval for the resolver

That bundle is a skill pack. A unit of reusable capability that compounds. The tests are the magic: coverage on the skill is what lets it change without breaking. This is what separates it from vibe coding. Vibe coding is a vibe. A skill pack has tests.

We are only now figuring out the systems primitives for agentic engineering in real time, the way the early CPU era invented the stack, the heap, the registers, the von Neumann machine. I think a skill pack is one of those primitives. A harness is another. Most people haven't noticed, because they're still measuring software in lines.

当我说 markdown 时，我不是指提示词。提示词是转瞬即逝的。你输入一些东西，得到一些东西，然后它就蒸发了。

这是构建。有版本控制，可测试，可复用。

Markdown 是指令层：意图、技能、关于工作该如何完成的判断。TypeScript 是薄薄的确定性层。那些真正必须是代码的少数事情，比如 I/O、绝不能产生幻觉的部分。

关键在于，你要像测试代码一样测试 markdown。在我的设置中，循环就是一个词。我用 agent 构建某个东西直到它工作，然后我说“skillify it”。agent 随后会写：

markdown 技能
它所需的最小代码
针对代码的单元测试
针对技能的 LLM 评测
跨两者的集成测试
一个 resolver，以便 agent 在相关时自动调用该技能
以及针对 resolver 的评测

这个包就是一个技能包。一个可复用能力的单元，能够不断叠加。测试是魔法：对技能的覆盖率让它可以在不破坏的前提下变化。这正是它区别于“vibe coding”的地方。Vibe coding 是一种感觉。技能包有测试。

我们现在才实时摸索出 agent 工程的系统原语，就像早期 CPU 时代发明了堆栈、堆、寄存器、冯·诺依曼机那样。我认为技能包就是那些原语之一。Harness 是另一个。大多数人还没注意到，因为他们还在用代码行数来衡量软件。

§ 7

This is not a toy argument. The agent does more than the five-hundred-thousand-line Rails app did, with a fraction of the new code. Concretely:

The hackathon judge. Two Saturdays ago we ran a GStack/GBrain hackathon. 85 submissions. I uploaded the Google Drive of submissions and said go. The agent analyzed every repo's code quality, did deep research on every single person who attended, watched and screenshotted each demo video, rated the screens, and rank-ordered all 85 teams. Then it told me the five apps from the batch worth paying attention to. Judging a hackathon went from a multi-day slog to about thirty minutes.

I didn't write the code. I had OpenClaw do the task, and I guided it. Then once it was done, I said skillify it, and now it's a tarball anyone can run against any hackathon spreadsheet, forever. I say "skillify" all the time now and I have more than 350 skillpacks. Almost every kind of personal and work task I need to do, now my agent can do.

That is the inversion in one example. A capability that would have been a real software project, with scrapers, a scoring pipeline, video processing, a research module, a ranking system, instead became markdown plus a little code, built by the agent, in an afternoon, reusable by everyone.

As an aside: The winner of the hackathon actually built code I ended up polishing up and landing on main! GStack can now test iOS apps both in simulator and on real devices, and that complete feature was made in less than 8 hours at a hackathon by a single person!

这不是一个空谈的论点。这个 agent 用远少于新代码的量，做得比那个五十万行的 Rails 应用更多。具体来说：

黑客马拉松评审。两周前的周六，我们办了一场 GStack/GBrain 黑客马拉松。85 份参赛作品。我把存有作品的 Google Drive 文件夹丢进去，说了声“开始”。agent 分析了每个仓库的代码质量，对每位参赛者做了深度研究，观看并截取了每个 demo 视频，对界面评分，并对全部 85 支队伍进行排序。然后它告诉我这批作品中值得关注的五个应用。评审一场黑客马拉松从几天功夫缩短到了大约三十分钟。

我没有写那些代码。我让 OpenClaw 去执行任务，我来引导它。完成后，我说“skillify it”，现在它成了一个 tarball，任何人都可以永久地对任何黑客马拉松的电子表格运行。我现在总是说“skillify”，已经有了超过 350 个技能包。几乎所有我需要做的个人和工作任务，现在我的 agent 都能做。

这就是反转的一个例证。一个本来会是真正的软件项目——包含爬虫、评分管道、视频处理、研究模块、排序系统——变成了 markdown 加一点代码，由 agent 在一个下午构建完成，人人可复用。

顺便提一句：黑客马拉松的冠军最终构建了代码，我打磨之后合并到了主分支！GStack 现在可以在模拟器和真实设备上测试 iOS 应用，而这一完整功能是由一个人在黑客马拉松上不到 8 小时内完成的！

§ 8

There's a price of admission, and almost nobody is paying it: you have to be willing to spend on tokens.

Peter Steinberger built OpenClaw, my favorite harness. He has said he's willing to spend on the order of a million dollars a year in tokens to do it. Most people hear that and flinch, but they shouldn't because that's the gold: you can live in 2028 if you can this, and it will be years before people catch up.

This is why OpenAI decided to offer $2M to every YC company as an uncapped SAFE in the form of token credits. There's something magical that happens when you can turn raw intelligence into tokens and then output that is actually usable by users and solves real needs for users that they'll pay for. If you're a founder you need to be maxxing out this capability. (This is why I keep harping on skillify because it's a real way to achieve these good outcomes.)

We spent the last era treating LLM calls like they were too expensive to make. We rationed them. That instinct is now the thing holding people back. If you are willing to tokenmax, to let the agent burn tokens freely and run constantly, you get a 1994 head start on the internet, paid for in tokens. It prices out the >99.99% of organizations still counting pennies on a resource that is collapsing in price, and hands the head start to the few who get it.

For a few hundred thousand dollars a year, for some far less, you can run today the way the rest of the world will be forced to run in a few years.

You can live in 2028 but in 2026, and that is worth the trade in paying more now since, those same tokes that cost $100K today will be $10K next year and $1K the year after that, and maybe $100 by end of 2028. If you could tell any founder in the history of the world that you could invest 6 figures in capital into living 2 to 3 years in the future and hold that advantage for years, 100 out of 100 founders worth their salt would take that deal.

The only thing in the way is the 2013 instinct that says the model calls are too expensive to make freely. They aren't. That was the old economics. The inversion already happened.

有一张入场券，几乎没有人愿意支付：你必须愿意在 token 上花钱。

Peter Steinberger 构建了 OpenClaw，我最喜欢的 harness。他说他愿意为此每年在 token 上花费约一百万美元。大多数人听到这话会退缩，但他们不应该，因为那里才是黄金：如果你能做到这一点，你就可以活在 2028 年，而其他人需要数年才能赶上。

这就是为什么 OpenAI 决定向每家 YC 公司提供 200 万美元无上限 SAFE 形式的 token 积分。当你能够将原始智能转化为 token，然后输出用户真正可用、能解决用户真正需求、用户愿意为之付费的东西时，神奇的事情就会发生。如果你是一位创始人，你需要最大化这种能力。（这就是我不断强调 skillify 的原因，因为它是实现这些好结果的真正方法。）

上一个时代我们视 LLM 调用为过于昂贵而不愿频繁使用。我们配给它们。这个直觉现在成了阻碍人们前进的东西。如果你愿意 tokenmax，让 agent 自由燃烧 token 持续运行，你就获得了 1994 年互联网的领先优势，以 token 为代价支付。这将 >99.99% 仍然在一个价格暴跌的资源上精打细算的组织挡在门外，并将领先优势交给少数理解这一点的人。

每年花几十万美元，有的甚至更少，你就可以按世界其他地区几年后被迫采用的方式运行。

你可以在 2026 年就活在 2028 年。这值得现在付出更多——因为今天价值 10 万美元的 token，明年会是 1 万美元，后年是 1 千美元，到 2028 年底可能只要 100 美元。如果你能告诉历史上任何一位创始人，你可以投入六位数资金来提前两三年活在未来并保持数年优势，百分之百的创始人都会接受这个交易。

唯一的障碍是那个 2013 年的直觉：模型调用太昂贵，不能随意使用。它们已经不贵了。那是旧的经济学。反转已经发生。

§ 9

If 540,000 lines of control code builds a Foxconn factory for the worker, the cure is to build the opposite.

There is a place on the cliffs at Big Sur called Esalen. People go there to be unmade and rebuilt, to drop the armor and come back more themselves. No assembly line, no foreman, no 6am whistle. Freedom, not control. Build that. Build a YC, where we try to help you build companies that solve real problems and reach product market fit.

Build places where the workers, both human and AI, are free and not enslaved.

That is the whole ethos. Make things where agents can be free. Make companies where humans can bounce their ball. In knowledge work, the factory is the failure mode. The institution that frees people is the goal, just now pointed at agents too.

OpenClaw is a Ferrari you have to bring a wrench for. The model is the engine, not the car. We're at the Apple I moment still, soldering breadboards. It ships rough. You have to finish it yourself still. GBrain, the retrieval engine and skillpacks I give away open source are not yet batteries included.

They say OpenClaw is unsafe. They don't understand the freedom is also how it is so powerful. You don't bolt safety rails onto a thing you trust before you know you hit the problem. The wrench in your hand is the sign nobody caged it.

A control system is polished because control needs total control, a Foxconn factory. A free system is rough because it trusts you to finish it. Pick which one you're building. Then look at how much code you wrote.

如果五十四万行控制代码为工人建了一座富士康工厂，解药就是建造相反的东西。

在大苏尔的悬崖上有一个地方叫 Esalen。人们去那里是为了被拆解和重建，卸下盔甲，做回更真实的自己。没有流水线，没有工头，没有早上六点的汽笛。自由，而非控制。建造那样的东西。建造一个 YC，在那里我们努力帮助你构建解决真实问题、达到产品市场匹配的公司。

建造那些工人——无论是人类还是 AI——都能自由而不被奴役的地方。

这就是全部的精神内核。创造 agent 可以自由做事的场所。创造人类可以自由蹦跳的公司。在知识工作中，工厂是失败模式。解放人的机构才是目标，现在这一点同样指向了 agent。

OpenClaw 是一辆你需要自带扳手的法拉利。模型是引擎，不是整辆车。我们此刻仍然处于 Apple I 的时代，还在焊接面包板。它出厂时很粗糙。你仍然需要自己完成它。GBrain、我开源提供的检索引擎和技能包还不是“开箱即用”的。

有人说 OpenClaw 不安全。他们不理解，自由恰恰也是它如此强大的原因。你不会在信任一个东西之前，还没遇到问题就给它装上安全护栏。你手里的扳手意味着没有人把它关进笼子。

控制系统很精致，因为控制需要完全的控制，一座富士康工厂。自由系统很粗糙，因为它信任你来完成它。选择你正在建造哪一种。然后看看你写了多少代码。

§ 10

540,000 lines of Rails was me proving I could still play the old game at the highest level, but that level was from Web 2.0, a decade ago.

I could play as well as I ever could, 1000x engineer in building Foxconn factories. Old code.

But the new game isn't played in lines of code at all. My haters, it turned out, were right. I tip my hat to you if you're reading, anons.

When you can turn intent directly into working, tested, reusable systems, the bottleneck stops being how much you can build and starts being what you actually want and whether it's worth building. The scarce resource becomes clarity, taste, and judgment. The engineer who writes the least code is often the one building the most.

I wrote 540,000 lines to learn that. You don't have to.

The series:

Fat Skills, Fat Code, Thin Harness -- the architecture
Resolvers -- the routing table for intelligence
The LOC Controversy -- what 600K lines actually produced
Naked Models Are Stupider -- the model is the engine, not the car
The Skillify Manifesto -- every workflow becomes a testable skill
Meta-Meta-Prompting -- compounding skills produce emergent capabilities
The Agent Complexity Ratchet -- 90% test coverage is magic for your codebase
540,000 Lines of Code I Didn't Need -- you are here

五十四万行 Rails 代码，是我在证明自己仍然能在最高的水平玩旧游戏——但那水平来自 Web 2.0，十年前的事了。

我能像以往一样出色地玩，作为建造富士康工厂的 1000x 工程师。旧代码。

但新游戏根本不以代码行数来衡量。事实证明，我的批评者们是对的。如果你们在阅读，我向你们致敬，匿名者。

当你能够将意图直接转化为可工作、可测试、可复用的系统时，瓶颈就不再是你能够构建多少，而是你真正想要什么，以及它是否值得构建。稀缺资源变成了清晰度、品味和判断力。写最少代码的工程师，常常是构建最多的那一个。

我写了五十四万行代码才学到这个道理。你不需要。

系列文章：

Fat Skills, Fat Code, Thin Harness——架构
Resolvers——智能的路由表
The LOC Controversy——60 万行到底产生了什么
Naked Models Are Stupider——模型是引擎，不是整车
The Skillify Manifesto——每个工作流都变成可测试的技能
Meta-Meta-Prompting——技能的叠加产生涌现能力
The Agent Complexity Ratchet——90% 测试覆盖率是代码库的魔法
540,000 Lines of Code I Didn't Need——你在此处

Open source ↗