我为什么拒绝‘氛围编程’
作者从吝啬、老派、热爱混乱等角度解释为何拒绝‘氛围编程’。他引用布鲁克斯的‘没有银弹’理论,指出LLM无法处理本质复杂性,并用DOGE误读社保数据库为例,说明缺乏质疑的数据分析之害。强调摩擦是架构反馈,编程乐趣在于过程与责任。兼谈AI伦理与就业忧虑。本文适合对AI编程持批判态度、关注软件开发人文维度的工程师阅读。
There has been a lot of discussion online lately about vibe coding and and how Large Language Models (LLMs) will revolutionize the field of software development. Every new model will launch us into realms of pure productivity, shipping software at the speed of thought and removing all the friction and overhead of product development. Or something like that.
Maybe. I’ll have to take your word for it. I don’t vibe code.
If it’s working for you, great! I’m not really here to argue the merits or flaws of LLMs at depth here in this piece, but it’s just never clicked for me personally. This page is a “brief” accounting of various reasons why.
最近网上有很多关于“氛围编程”(vibe coding)的讨论,以及大型语言模型(LLM)将如何彻底改变软件开发领域。每个新模型都将把我们带入纯粹的效率王国,以思维的速度交付软件,消除产品开发中的所有摩擦和开销。大概就是这样。
也许吧。那我就相信你说的话。我不进行氛围编程。
如果这对你有效,那太好了!我并不是要在这篇文章中深入讨论 LLM 的优点或缺点,只是它对我来说始终没有产生共鸣。这篇文章是对各种原因的“简短”说明。
I’m not a purist. I’ve tried using LLMs that are integrated into an IDE. They have been useful for some tasks that are simple enough to be easily describable but annoying enough to not just do them myself. For instance, resizing a grid of square images to be smaller. I could go look at the command-line arguments for ImageMagick, but that was a perfect thing to ask the AI to do. I then tried using one of the AI tools to analyze my code in a project and a few other small tasks before it all came to an awkward halt. The system informed me that I had just run out of credits and I would need to provide a credit card to purchase more tokens I wanted to keep going.
Now, you must understand that I come from a long line of cheapskates from both sides of my family tree. We’ve been pinching pennies and hunting bargains for centuries both here and on the other side of the Atlantic. As an example, one of my distant ancestors died during the King Philip’s War because he left the safety of the fort to retrieve some cheese he had left behind when evacuating his house. So you must believe me that the idea of paying a service in perpetuity so I could think just seemed so laughably absurd and horrific that I didn’t even bother giving them my card. I closed the laptop. I uninstalled the IDE and went back to using Emacs even. And I realized that I just didn’t even notice the lack anymore.
我不是纯粹主义者。我试过集成到 IDE 中的 LLM 工具。它们对一些简单到容易描述、但烦人到不值得自己动手的任务很有用。例如,将一个正方形图片网格缩小。我可以去查 ImageMagick 的命令行参数,但那是让 AI 来做的完美任务。然后我尝试使用其中一个 AI 工具来分析我项目中的代码,以及其他一些小任务,然后一切就尴尬地停止了。系统通知我,我的积分用完了,我需要提供信用卡购买更多 token 才能继续。
现在,你必须明白,我来自一个吝啬鬼家族,父母双方都是。几个世纪以来,我们一直在精打细算、寻找便宜货,无论是在这里还是在大西洋彼岸。举个例子,我的一位远祖在菲利普国王战争中去世,因为他离开安全的堡垒去取他撤离房子时留下的奶酪。所以你得相信我,那种为了思考而永久付费的想法简直荒谬可笑、令人恐惧,我甚至懒得给他们我的卡。我合上笔记本电脑。我卸载了 IDE,甚至回到了 Emacs。我意识到我甚至不再注意到缺少这些工具了。
It does help that I’m old. I’ve been writing code for a long time, especially in an industry that calls a developer with 5 years of experience a “senior engineer.” Experience is a welcome antidote to anxiety sometimes (as long as it’s not anxiety about ageism in an industry that calls a developer senior with only 5 years of experience) , and the AI hype doee remind me of earlier breakthroughs in low and no-code tooling. I don’t doubt that AI can be a useful tool for developers. I know there are tasks it can help with as better tooling. But these arguments always leave me thinking about the accidental and essential complexity again.
Fred Brooks was old even when I was a young coder myself. As the project manager for IBM’s System 360 line of mainframes (and accompanying operating system) he had a front row seat to when all the now common ways software projects go wrong were novel and new. He collected these observations in a book The Mythical Man-Month which should still be required reading for software engineering courses today. My edition was a newer reprint that included a later essay titled “No Silver Bullet” where Brooks looked at the effect that new tools can have on developer productivity. To think like a programmer, you must understand that the real world is complex. Programming can be best thought of as imposing simplified representations – we call them _abstractions – on top of our messy reality to make it understandable by reducing complexity. This lets us generalize specific situations into layers that can be built on top of each other. For instance the specific action of putting peanut butter onto a piece of bread could be generalized into a spread(substance) method that could take peanut butter or cream cheese as an argument. And we could use these spread methods to create higher-level functions like create_pbj() and so on. Coding in a modern high-level programming language is like standing on top of a ziggurat of abstractions, where a single line of code could trigger millions of operations on multiple systems. It’s very exciting!
我年纪大了,这确实有帮助。我写代码已经很长时间了,尤其是身处一个将 5 年经验的开发者称为“高级工程师”的行业。经验有时是缓解焦虑的良药(只要这种焦虑不是关于年龄歧视——在这个行业里,5 年经验就能被称为高级工程师)。AI 热潮确实让我想起了早期低代码/无代码工具的突破。我不怀疑 AI 可以成为开发者的有用工具。我知道有些任务它可以作为更好的工具来帮助。但这些论点总是让我再次想起偶然复杂性和本质复杂性。
当我还是个年轻编码员时,Fred Brooks 就已经老了。作为 IBM System 360 大型机系列(及配套操作系统)的项目经理,他亲眼目睹了如今软件开发项目失败的常见方式在当时是多么新颖。他将这些观察收集在《人月神话》一书中,这本书至今仍应是软件工程课程的必读书。我的版本是较新的重印版,其中包括后来的一篇论文《没有银弹》,Brooks 在其中考察了新工具对开发者生产力的影响。要像程序员一样思考,你必须明白现实世界是复杂的。编程最好被理解为在混乱的现实之上施加简化的表示——我们称之为抽象——通过降低复杂性使其可理解。这让我们将特定情况泛化为可以相互构建的层次。例如,将花生酱涂在一片面包上的具体动作可以泛化为一个 spread(substance) 方法,它可以接受花生酱或奶油奶酪作为参数。我们可以使用这些 spread 方法创建更高级的函数,比如 create_pbj() 等等。用现代高级编程语言编码就像站在抽象的金字塔上,一行代码可能触发多个系统上的数百万次操作。这非常令人兴奋!
Now, what if we could keep going and abstract away the act of programming itself? This is the dream of agentic AI, where swarms of agents can be given tasks to implement on their own without supervision. Sounds great! But this is addressing what Brooks calls accidental complexity, the things that are complicated about writing code itself. In the time since the essay was written, software development has made great strides against this type of complexity. Instead of writing in low-level machine code, we can use modern dynamically interpreted languages which are compiled to assembly. Instead of remembering how to write a quick sort (trust me, you’re going to want to click that link) from scratch, I just need to call a sort method in a standard library. Instead of having to build a whole web application from scratch, I can use an existing framework. If I want to rename or restructure some code, my editor can help do that for me. AI seems like the latest iteration and some editors have already replaced their predictable old tooling for renaming and refactoring code with unpredictable AI agents. Sure, it might seem like rolling the dice, but how common is a critical failure anyway?
However, even as the better tooling has diminished accidental complexity, essential complexity still remains. There still is the complicated work of designing our abstractions and systems the right way, one that is elegant, clear and maintainable. And that complexity isn’t going anywhere. This type of work takes skill and experience and wisdom hard-won from system failures past. And, I’m not sure if LLM’s fancy autocomplete approach works so well with this type of complexity, which often isn’t so straightforward to solve. Maybe with prompting, it could be guided toward a preferred approach, but at that point the person doing the guiding might as well design the approach alone, since the LLM wouldn’t be able to articulate why it chose a certain path. Essential complexity is often weird and rare and messy. Maybe I’m wrong and the models are getting better at these kind of messy situations as well, but I’ve found that it often requires a specific kind of mindset and approach. Luckily for me, I love the messy stuff.
那么,如果我们能继续抽象掉编程本身呢?这是智能体 AI 的梦想,一群智能体可以自主执行任务,无需监督。听起来很棒!但这解决的是 Brooks 所说的偶然复杂性,即编写代码本身复杂的事情。自那篇论文发表以来,软件开发在对付这种复杂性方面取得了巨大进步。我们不再用低级机器码编写,而是使用现代动态解释型语言,然后编译为汇编。我不再需要从头记住如何编写快速排序,只需调用标准库中的排序方法。我不必从头构建整个 Web 应用程序,可以使用现有框架。如果我想重命名或重构代码,我的编辑器可以帮我完成。AI 似乎是最近的迭代,一些编辑器已经用不可预测的 AI 智能体取代了可预测的旧式重命名和重构工具。当然,这看起来像是在掷骰子,但关键失败到底有多常见呢?
然而,即使更好的工具减少了偶然复杂性,本质复杂性依然存在。仍然有复杂的工作需要以正确的方式设计我们的抽象和系统,使其优雅、清晰且可维护。这种复杂性不会消失。这类工作需要技能、经验和从过去系统失败中艰难获得的智慧。而且,我不确定 LLM 的花哨自动完成方法是否能很好地处理这种复杂性,它往往不是那么容易解决的。也许通过提示,可以引导它采用首选方法,但到那时,做引导的人还不如自己设计方法,因为 LLM 无法阐明为什么选择某条路径。本质复杂性往往是怪异、罕见且混乱的。也许我错了,模型在处理这类混乱情况方面也在变得更好,但我发现这通常需要一种特定的心态和方法。幸运的是,我喜欢混乱的东西。
I’ve been talking so far about how software can abstract processes, but we also use abstraction’s reductive properties as a tool to understand the world. In the classic book Seeing Like a State, James Scott describes how the motivating project of the post-enlightenment was to make their populations and possessions legible through abstraction and categorization. To measure is to modify. For instance, a country might begin to look at its forests not as complex ecosystems but just assessed by their percentages of timber that can be used for ship-building. This view then allows a country to act on this information in ways like replacing those forests with monocultures of just a single tree. A forest is abstracted into a system for growing ship masts.
This approach created the bureaucracy and the paper form, which has evolved into the web form and database. As programmers, we need to reduce the messy data of the world in order to act on it. We expect our dates to be exact. We expect names to be relatively simple. We expect data to be complete at time of entry and consistent over time. Every programmer and every system design is a series of Procrustean choices about what aspects of reality we want to reflect in our systems and what we can discard. I’m not saying this to criticize; this approach is the only way to build systems that aren’t bogged down in an endless thicket of special situations (what we call “edge cases” because they’re supposed to be rare paths on the periphery). But, this process is so innate that we sometimes forget that it is also artificial, especially when it’s describing people. Forcing a gender field to only accept “male” or “female” doesn’t force gender itself to be binary. Our definitions of race are social constructions that shift all the time. Our simplified model might provide us with insights (autism diagnoses have increased 300% over the last 20 years!) but not capture the underlying factors behind those insights (it’s likely just a result of changes in how we define autism and increased screening). It’s important to step back and look at the bigger picture of how any model was made and what type of knowledge it doesn’t capture. Every abstraction is also an occlusion. As a data journalist, I learned how to interview data and how to be highly rigorous about all the ways in which the answers I found could be misleading. Paranoia is the data journalist’s best friend, if you want to avoid an embarrassing correction. You need to be able to think about not just what the data says, but all the stuff it doesn’t include.
到目前为止,我一直在讨论软件如何抽象过程,但我们也使用抽象的简化属性作为理解世界的工具。在经典著作《国家的视角》中,詹姆斯·斯科特描述了后启蒙运动的核心目标是通过抽象和分类使人口和财产变得可读。测量即改变。例如,一个国家可能不再将森林视为复杂的生态系统,而仅根据可用于造船的木材百分比来评估。这种观点使国家能够据此采取行动,例如用单一树种的纯林取代这些森林。森林被抽象为种植船桅的系统。
这种方法创造了官僚制度和纸质表格,进而演变为网络表单和数据库。作为程序员,我们需要简化世界的混乱数据才能对其采取行动。我们期望日期精确,名字相对简单,数据在输入时完整且随时间一致。每个程序员和每个系统设计都是关于我们想在系统中反映现实的哪些方面、可以丢弃哪些方面的一系列普罗克拉斯提斯式选择。我这样说并非批评;这种方法是构建系统不被无休止的特殊情况(我们称之为“边缘情况”,因为它们应该是外围的罕见路径)所困的唯一途径。但是,这个过程如此根深蒂固,以至于我们有时会忘记它也是人为的,尤其是在描述人时。强制性别字段只接受“男”或“女”并不会强制性别本身是二元的。我们对种族的定义是随时变化的社会建构。我们的简化模型可能会提供洞察(自闭症诊断在过去 20 年中增加了 300%!),但不会捕捉这些洞察背后的潜在因素(很可能只是由于自闭症定义的变化和筛查的增加)。退后一步,审视任何模型是如何构建的以及它没有捕捉到哪种知识,这很重要。每一种抽象也是一种遮蔽。作为一名数据记者,我学会了如何采访数据,如何高度严谨地审视我发现答案可能具有误导性的所有方式。偏执是数据记者最好的朋友,如果你想避免令人尴尬的更正。你需要能够思考的不仅是数据所说的内容,还有所有它不包含的东西。
Unfortunately, this metacognition is something an LLM can’t ever do. The model is their reality. As Robin Sloan succinctly notes in his compelling essay “Are Language Models in Hell?”, AI models are built from and view the world in a stripped-down way. Where you and I might look at text and see its context (things like the text formatting and titles, the author’s bio, the site where this was linked from), the LLM operates purely on a world of letters and nothing more (technically, they’re receiving subword tokens, which is why early models couldn’t count the letter ‘r’ in strawberry). Asking a LLM to recognize the limitations of its view on reality is like asking a goldfish how the water is.
When I was writing this section, I have been thinking about DOGE’s inept attempts to find fraud at the Social Security Administration. In one example, DOGE looked at the SSA databases and discovered there were over 9 million records in there with birth dates over 120 year ago but no death dates recorded. Elon Musk declared the only possible explanation was that millions of people were fraudulently receiving benefits. He was wrong about both the cause of the problem and the severity of its impact. DOGE could’ve questioned the data quality. They could’ve examined payments being made. They could’ve asked any of the experts at SSA to explain it to them. But instead they took the data as it is and leaped to wrong conclusions, a pattern they repeated over and over (as in this example of a different fraud claim about payments).
不幸的是,这种元认知是 LLM 永远无法做到的。模型就是它们的现实。正如 Robin Sloan 在其引人注目的文章《语言模型在地狱里吗?》中简洁指出的那样,AI 模型是以一种剥离的方式构建和看待世界的。你和我看到文本时可能会看到它的上下文(例如文本格式和标题、作者简介、链接来源网站),而 LLM 纯粹在一个字母的世界中运作,仅此而已(从技术上讲,它们接收的是子词 token,这就是为什么早期模型无法计算 strawberry 中字母 r 的数量)。要求 LLM 认识到其现实视角的局限性,就像问一条金鱼水怎么样。
在写这一节时,我一直在思考 DOGE 在社会保障局寻找欺诈行为的笨拙尝试。在一个例子中,DOGE 查看了 SSA 数据库,发现有超过 900 万条记录的出生日期超过 120 年前,但没有记录死亡日期。埃隆·马斯克宣称唯一可能的解释是数百万人欺诈性地领取福利。他对问题的原因和影响的严重性都判断错了。DOGE 本可以质疑数据质量。他们本可以检查实际支付的款项。他们本可以请教 SSA 的任何专家来解释。但他们却直接照搬数据,贸然得出错误结论,这种模式他们一再重复。
The appeal of LLM-driven development is that it’s supposed to eliminate friction. Boosters spin tales of development teams shipping dozens of features in a single day, using multiple teams of agents working autonomously at their command in increasingly strange topologies. And I get it, software development can be tedious and frustrating. It must feel super exciting to be able to churn out code at relatively ludicrous speeds and play with polished products instead of prototypes.
I need the friction though.
When I am first learning a new language or framework, I struggle with friction to do even the most basic tasks. It sucks! And when am working with a new and unfamiliar code repository or data source, I need to set aside hours to scrutinize it. I often find myself doing a close reading, pulling up specific files to look over line by line until I understand their context and the choices their developers made. I know I could just ask an LLM to summarize the project for me and save myself the time, but I’ve found I need this process to really marinate in the code. I need it to not just understand the choices the developers made, but why they made them and how they reflect the constraint or idioms of the language they are using. I learn by failing, and if the LLM takes that work away from me, I won’t really understand what I’m doing.
Even when working in familiar languages and my own code, I still rely heavily on friction as a clue. When writing the code becomes hard, that tells me that I’m going down a wrong path with the current architecture, and that I should seriously consider redesigning things to make future enhancements easier. When that happens, I usually go out for a long walk (or sign off for the day) to give my brain space to step back and consider things from a new angle. It really works. I find these pauses so effective that I will even force it upon myself when the way seems clear. When working on large software projects, I will wait to start coding a new feature until I’ve written an Architectural Decision Record first that describes what I want to do. These documents force me to capture what I’m thinking at this point in time, my assumptions about the problem and the ramifications of my approach. Sometimes, it even makes me realize I was too enamored with my initial hunch to see how it would go astray, and it always serves a good way to capture “what were they thinking?” for any future inheritors of my work.
LLM 驱动开发的吸引力在于它应该消除摩擦。支持者讲述着开发团队在一天内交付数十个功能的故事,使用多个智能体团队以越来越奇特的拓扑结构自主工作。我理解,软件开发可能繁琐且令人沮丧。能够以相对荒谬的速度生成代码,并玩转完善的产品而不是原型,肯定感觉非常令人兴奋。
但我需要摩擦。
当我第一次学习一门新语言或框架时,我连最基本的任务都做得很费劲。这很糟糕!当处理一个不熟悉的代码仓库或数据源时,我需要花几个小时仔细审查。我经常进行精读,调出特定文件逐行检查,直到理解它们的上下文以及开发者做出的选择。我知道我可以让 LLM 为我总结项目以节省时间,但我发现我需要这个过程来真正沉浸在代码中。我不仅需要理解开发者做出的选择,还要理解他们为什么做出这些选择,以及它们如何反映了所使用语言的约束或习惯用法。我通过失败来学习,如果 LLM 剥夺了我的这项工作,我就不会真正理解我在做什么。
即使在处理熟悉的语言和我自己的代码时,我仍然严重依赖摩擦作为线索。当编写代码变得困难时,这表明我正沿着当前架构的错误方向走,我应该认真考虑重新设计,以使未来的增强更容易。当这种情况发生时,我通常会去长时间散步(或者当天收工),给大脑留出空间,退后一步从新的角度思考。这真的有效。我发现这些暂停非常有效,以至于我甚至在道路似乎清晰时也会强制自己暂停。在处理大型软件项目时,我会等到先编写一个架构决策记录(描述我想做什么)后才开始编写新功能。这些文档迫使我捕捉此刻的想法、对问题的假设以及方法的后果。有时,它甚至让我意识到我过于迷恋最初的直觉,没看出它如何会误入歧途,而且它始终是捕捉“他们当时在想什么?”的好方法,供我未来的代码继承者参考。
The LLM-driven approach to friction is to just code your way through it without rethinking anything. And the LLM will oblige. It’ll probably make code that will work. The performance metrics will be fine, the tests will pass (especially if they also were written by the LLM). But it won’t know why it chose that path. It doesn’t feel friction and can’t explain if one architectural approach felt cleaner than another. If the engineers crafting the prompts lack the insight to know what is a good approach or a bad one, they get stuck in a dynamic of asking the AI to code its way through friction over and over again. This can result in a thicket of weird abstractions, and the only design documentation for future teams is a single Markdown file that contained the instructions for an AI model used a few years back. Good luck reconstructing the architectural decisions from that! It is telling that most of the vibe coding success stories I’ve seen have been by developers who are already experts in what they are asking the LLM to build (and who are thus able to guide its work), or for situations where the stakes of failure are low. For the everything else, we just have to figure out how to know if the rest of the fucking owl is any good and safe to use.
I’d be remiss if I didn’t mention one other thing that bothers me when LLM promoters invoke friction as a problem. Most of the LLM marketing in advertisements, live demos and LinkedIn posts that I’ve seen portrays a solitary engineer (or perhaps a single team) heroically using LLM-driven coding to blast out some sort of app or website and launch it quickly (our velocity and KPI is through the roof!). But industry really wants developers to use LLMs for work, where the friction is usually established processes and practices to keep defects or even poorly-conceptualized features from making it to production. Inevitably, the need to prioritize LLM-driven velocity is turned against people themselves – other engineers or team-mates in product or project management or testing or compliance or design. Because those roles are seen as friction too. Who needs user research when we can craft AI personas? Who needs design when we have AI tools to spit out web layouts? Who needs project managers when we are the managers of our army of agents? What if we didn’t need to wait for another developer to review our pull request and just automatically merge code that passes tests and scans? What if we didn’t have to spend any of our work time talking to other people and just could live in the realm of pure coding? But, software development is a collaborative process, and each member of the team helps make a good product what it is. Removing those roles or replacing them with LLM-inflected ghosts will certainly allow teams to move faster, but it doesn’t mean the products that they deliver will be better. And the process will certainly be a lot lonelier.
LLM 驱动的开发处理摩擦的方式就是直接编码通过,而不重新思考任何东西。LLM 会照做。它可能会生成能工作的代码。性能指标会很好,测试会通过(尤其是如果它们也是由 LLM 编写的)。但它不知道为什么要选择那条路径。它感觉不到摩擦,也无法解释一种架构方法是否比另一种更清晰。如果编写提示词的工程师缺乏洞察力,不知道什么是好方法或坏方法,他们就会陷入一种动态:反复要求 AI 通过编码绕过摩擦。这可能导致一堆奇怪的抽象,而未来团队的唯一设计文档就是一份 Markdown 文件,其中包含几年前使用的 AI 模型的指令。祝你好运从中重建架构决策!我发现,我见过的多数氛围编程成功案例要么是那些已经对所要求 LLM 构建的东西是专家的开发者(因此能够指导其工作),要么是失败风险很低的情况。对于其他所有情况,我们只能想办法知道“剩下的该死的猫头鹰”是否够好且安全可用。
如果我不提另一件让我困扰的事,那就是失职了:LLM 推广者将摩擦视为一个问题。我在广告、现场演示和 LinkedIn 帖子中看到的大多数 LLM 营销都描绘了一位孤独的工程师(或单个团队)英雄般地使用 LLM 驱动编程快速推出某种应用或网站(我们的速度和 KPI 爆表了!)。但行业真正希望开发者将 LLM 用于工作,而工作中的摩擦通常是既定的流程和做法,目的是防止缺陷甚至构思不佳的功能进入生产环境。不可避免地,优先考虑 LLM 驱动速度的需求会转而针对人本身——其他工程师或产品、项目管理、测试、合规或设计方面的团队成员。因为这些角色也被视为摩擦。当我们能塑造 AI 角色时,谁还需要用户研究?当我们有 AI 工具生成网页布局时,谁还需要设计?当我们是我们智能体军队的管理者时,谁还需要项目经理?如果我们不必等待另一个开发者审查我们的拉取请求,而是自动合并通过测试和扫描的代码呢?如果我们不必花任何工作时间与他人交谈,而只生活在纯粹编码的领域呢?但是,软件开发是一个协作过程,团队中的每个成员共同造就了好产品。移除这些角色或用 LLM 影响的幽灵取代它们,肯定能让团队更快行动,但这并不意味着他们交付的产品会更好。而且这个过程肯定会孤独得多。
Perhaps my simplest reason for not using LLMs is that I just love programming so much that I don’t want to hand it off to a machine. In much the same way I wouldn’t resort to AI if I were an artist or a musician, programming is one way for me to express my creativity, and I will not cede that joy. Although it can be extremely frustrating at times, there is a profound delight in shaping something from a nebulous idea into a real system, especially if it involves an elegant implementation or interesting problems. Some evenings, I close the work laptop and open the personal laptop to dive into some new fun thing I want to build. And when I am building software professionally as part of a team, that is even better! I love the collaboration and the process of shaping software together, especially the ways in which people will step up and take ownership of problems. I don’t think the dynamic is the same when the team is just taking ownership of prompts and the LLM assistant is doing the work. Or the LLM assistant is replacing parts of the team.
Ownership is important. Over the past few decades, I’ve worked in roles where I’ve developed a strong sense of personal responsibility. As a data journalist, an error in code could lead to an embarrassing correction or a devastating lawsuit. In civic technology, errors can mean catastrophic failures in providing services and benefits, whether it’s to an entire vulnerable population or to a single person. I’m not going to say that I’ve never made mistakes, but I care a lot about getting it right because I care about the mission of the work. I have been privileged to work on teams with many other people who also care and want to do the best they can for people. An LLM can’t care. Sure, it can do a convincing job of pretending, but it’s still just a facsimile of a mind stringing together words that are more likely to be associated with other. It’s not bothered by its mistakes or trying to do better, because it has no inner consciousness, let alone a conscience. It can never be held accountable, and I can never hand off my moral responsibilities to it for that reason.
When the LLM does well, it’s a genius that will replace all coders. When the LLM deletes all of your infrastructure or “lies” about tests, it’s your fault. After all, you just needed to structure your prompts and workflows exactly the right way so it will jostle the LLM into giving the correct output. Oops, try again. And again. Much of the LLM advice I’ve read emphasizes that you must give all the necessary instructions and amendments and codicils up front or the system will do things wrong. This mindset is a significant departure from agile programming, which emphasizes frequent course corrections and feedback and trusting in your team to do the right thing. Instead, we seem to be retreating to a new usage model similar to the time-sharing models of early computers in the 1950s. Except here, instead of walking up to hand in a sheaf of punch cards, the solitary programmer is instead bringing legal documents to be turned into programs.
I jest; there is no legal liability at play here. It’s probably not surprising given the similar demographics involved, but LLM suppliers are repeating the same dynamic as Tesla. New features are being rolled out to user without safety testing and, just as strangely, LLM boosters, like Tesla superfans, often blame themselves and others for catastrophic outcomes by saying the users should’ve done better in writing their prompts. I’m not really sure what to make of this, but it bothers me that technology is standardizing a capitalism where more risks are being borne by consumers because companies and government have both abdicated their responsibilities. We banned lawn darts after they killed a single child, but chatbots driving users to death and psychosis are accepted as the price of innovation in AI. Will things change when vibe coding itself leads to someone dying from system failures rather than dying of embarassment?
Coding has also been my comfort when times are hard. There is research that playing Tetris is an effective way to avoid PTSD. The theory is that the therapy works because engaging the parts of the brain that handle arranging and rotating shapes hinders the formation of traumatic memories. Now, I am fortunate enough to not suffer from PTSD (and I am not making light of people who do), but I do also relate to this concept. Programming feels like a complicated puzzle and has sometimes been my solace in dark times. As the example above hints at, I know a lot about DOGE, because for the past year I’ve been building and maintaining a system to track their rampage. Unlike a work project, this has been an exercise in assembling datasets to provide clarity into an organization that wants to stay obscured. It’s been a rewarding exercise and a way for me to channel my despair into something I hope will be useful. This isn’t the only time I’ve used code as a way to work through my sadness, and it works because it is work and the process would be diminished if I only focused on the product.
也许我不使用 LLM 的最简单原因是我太热爱编程了,不想把它交给机器。就像如果我是艺术家或音乐家,我不会求助于 AI 一样,编程是我表达创造力的一种方式,我不会放弃这种快乐。尽管有时它非常令人沮丧,但从模糊的想法塑造出一个真实系统,其中蕴含着深沉的喜悦,尤其是当它涉及优雅的实现或有趣的问题时。有些晚上,我合上工作电脑,打开个人电脑,投入我想构建的新乐趣。当我作为团队一员专业地构建软件时,那就更棒了!我喜欢协作和共同塑造软件的过程,尤其是人们会站出来承担问题所有权的方式。我不认为当团队只负责提示词而 LLM 助手做实际工作时,这种动态是相同的。或者当 LLM 助手取代部分团队成员时。
所有权很重要。在过去的几十年里,我担任过一些角色,培养出了强烈的个人责任感。作为一名数据记者,代码错误可能导致令人尴尬的更正或毁灭性的诉讼。在公民技术领域,错误可能导致提供服务福利的灾难性失败,无论是针对整个弱势群体还是单个个人。我不会说我从未犯过错误,但我非常在乎把事情做对,因为我在乎工作的使命。我有幸曾与许多同样在乎并希望为人们尽最大努力的人一起工作。LLM 无法在乎。当然,它可以令人信服地假装,但它仍然只是一个将更可能关联在一起的词语串起来的思维的仿制品。它不会因为自己的错误而烦恼,也不会试图做得更好,因为它没有内在意识,更不用说良心了。它永远无法被追究责任,正因如此,我也永远无法将我的道德责任交给它。
当 LLM 做得好时,它是天才,将取代所有程序员。当 LLM 删除你所有的基础设施或对测试“撒谎”时,那是你的错。毕竟,你只需要以完全正确的方式构建你的提示词和工作流程,从而促使 LLM 给出正确的输出。哎呀,再试一次。再试一次。我读过的许多 LLM 建议都强调,你必须提前给出所有必要的指令、修正和附录,否则系统就会出错。这种心态与敏捷编程大相径庭,敏捷编程强调频繁的修正和反馈,并信任团队做出正确的事。相反,我们似乎正在退回到一种类似于 20 世纪 50 年代早期计算机分时模型的新使用模式。只不过,孤独的程序员不是走上前递交一叠穿孔卡片,而是带来法律文件,要将其变成程序。
我开玩笑;这里没有法律责任。考虑到相关人群的人口统计学相似性,这也许并不奇怪,但 LLM 供应商正在重复特斯拉的动态。新功能在未进行安全测试的情况下就向用户推出,同样奇怪的是,LLM 的拥护者像特斯拉超级粉丝一样,常常将灾难性后果归咎于自己和他人,说用户本应在编写提示词方面做得更好。我不确定该如何看待这一点,但让我困扰的是,技术正在标准化一种资本主义,其中更多风险由消费者承担,因为公司和政府都放弃了责任。我们在草坪飞镖杀死一个孩子后就禁用了它们,但导致用户死亡和精神病的聊天机器人却被接受为 AI 创新的代价。当氛围编程本身导致某人因系统故障而死亡,而不是死于尴尬时,事情会改变吗?
编程在困难时期也是我的慰藉。有研究表明,玩俄罗斯方块是避免创伤后应激障碍的有效方法。其原理是,让大脑负责排列和旋转形状的部分参与进来,会阻碍创伤记忆的形成。幸运的是,我没有遭受 PTSD(我也没有轻视遭受它的人),但我也认同这个概念。编程感觉像一个复杂的谜题,有时是黑暗时刻的慰藉。正如上面的例子所暗示的,我对 DOGE 了解很多,因为在过去一年里,我一直在构建和维护一个系统来追踪他们的肆虐。与工作项目不同,这是一次组装数据集的练习,以揭示一个希望保持隐蔽的组织的透明度。这是一次有益的练习,也是我将绝望转化为我希望有用的东西的方式。这并非我唯一一次用代码来化解悲伤,它之所以有效,是因为它是工作,而如果我仅仅关注产品,这个过程就会大打折扣。
This has already proven to be a much longer piece than I expected, especially since it was originally just a few short posts on Bluesky. Before I close it out, a few more quick reasons!
First, I absolutely hate the unctuous tone that AI chatbots default take by default. As someone who grew up in a city on the East Coast, I get really suspicious when someone is weirdly super nice to me without me knowing them, because it usually means they’re either about to launch into a scam or proselytize to me. Reading LLM chat transcripts makes my skin crawl. Yes, I am aware I could make the LLM adopt a whole different tone, but somehow that makes the idea feel even worse.
Like many developers, I have a whole folder of draft hobby projects that have never been finished. For instance, there’s the one where I was going to write a clone of Spelling Bee, but it was going to be in Clojurescript so I could use the Blabrecs code to generate non-words and make it super frustrating. Okay, I guess that would’ve just been funny to me. You had to be there. From the LLM perspective, these are folders of failures and I could indeed use LLM to make an app a day or whatever challenge I want. However, the process was far more important than the product (again!). Not every whimsy needs to become a reality. Often, I get more from the fun of brainstorming and the process of learning enough to know that I don’t need to continue and finish the job. It’s easy to forget this sometimes.
This wasn’t going to be an essay about the morality of using LLMs for my work. Not because I don’t care, but because many others have written far more effectively than me about the fraught implications of this technology. And at this moment where LLMs are bombing schools with children or generating child porn on demand, I really don’t feel comfortable using them. And I don’t feel comfortable not mentioning this aspect at all. It may be true that there is no ethical consumption under capitalism, but I’ll be damned if I’m not going to at least try. We can’t build a better world with tools that immiserate so many.
Weirdly, nobody seems more miserable than LLM boosters. I might be more swayed if developers were using their newfound productivity gains to finally live that 4-hour workweek that nerds were pretending to idolize 10 years ago. But perversely, it seems like many in Silicon Valley are outsourcing work to the AI agents and then using their newfound spare time to do even more work. Instead of using their time for relaxation or art or joy, they’re embracing 9-9-6 work schedule and a hyper-quantified workplace that would make even Frederick Taylor blanch in horror. It’s possible that the LLM revolution will finally come for me and my job, but I’d rather not work myself into the grave first.
这篇文章已经比我预期的长得多,尤其是它最初只是 Bluesky 上的几个短贴。在结束之前,还有几个快速理由!
首先,我绝对讨厌 AI 聊天机器人默认采取的油腔滑调的语气。作为一个在东海岸城市长大的人,当我不认识的人对我异常友善时,我会非常怀疑,因为这通常意味着他们要么要推销骗局,要么要传教。阅读 LLM 聊天记录让我起鸡皮疙瘩。是的,我知道我可以让 LLM 采用完全不同的语气,但不知何故,这个想法感觉更糟。
像许多开发者一样,我有一个装满从未完成的业余项目草稿的文件夹。例如,有一个项目我打算写一个 Spelling Bee 的克隆,但要用 Clojurescript 编写,这样我就可以用 Blabrecs 代码生成非单词,让它超级令人沮丧。好吧,我想那只会让我觉得好笑。你当时在场就好了。从 LLM 的角度来看,这些都是失败的文件夹,我确实可以用 LLM 每天做一个应用或任何我想要的挑战。然而,过程比产品重要得多(再次强调!)。并非每一个奇思妙想都需要成为现实。通常,我从头脑风暴的乐趣和足够了解以致于知道自己不需要继续完成的过程中收获更多。有时很容易忘记这一点。
我本不打算写一篇关于在工作使用 LLM 的道德问题的文章。不是因为我不在乎,而是因为许多其他人已经比我更有效地写了这项技术的危险含义。而在当下,当 LLM 被用于轰炸有孩子的学校或按需生成儿童色情内容时,我真的觉得使用它们很不舒服。而且如果完全不提这一点,我也会感到不安。资本主义下可能没有道德消费,但我绝对会至少尝试一下。我们不能用让那么多人贫困的工具来建设一个更美好的世界。
奇怪的是,似乎没有人比 LLM 的拥护者更痛苦。如果开发者利用新获得的生产力收益,最终过上那种书呆子 10 年前假装崇拜的每周 4 小时工作制,我可能会更被说服。但反常的是,硅谷的许多人似乎正在将工作外包给 AI 智能体,然后用新获得的空闲时间做更多的工作。他们不是用时间来放松、艺术或快乐,而是拥抱 9-9-6 工作制和量化到极致的工作场所,这甚至会让弗雷德里克·泰勒都惊恐地退缩。LLM 革命最终可能会降临到我头上并夺走我的工作,但我宁愿不要先把自己累进坟墓。
I don’t pretend to know the future. Maybe the technology will advance to such a point I will regret my lack of experience and familiarity. Or, maybe it’ll stagnate and the whole financial house of cards will come tumbling down. If that happens, I hope we can rebuild software development into the humane practice of building a better world, one line of code at a time.
我不假装知道未来。也许技术会进步到如此地步,以至于我会后悔自己缺乏经验和熟悉度。或者,也许它会停滞不前,整个金融纸牌屋会轰然倒塌。如果发生这种情况,我希望我们能将软件开发重建为一种人道的实践,一次一行代码地建设一个更美好的世界。