日刊 /2026-06-30 / 研究者的可训练技能栈：从挑选问题到刻意犯错

研究者的可训练技能栈：从挑选问题到刻意犯错

原文 x.com 收录 2026-06-30 06:01 阅读 10 min

AI 解读

本文由 @itsreallyvivek 撰写，提出研究能力并非天赋，而是一套可被刻意训练的次级技能栈。核心论点包括：挑选自己真正关心的问题以制造原创性，升级信息源（读旧材料、跨领域、读原文而非线程摘要）以打破共识，将想法写下来以暴露认知漏洞，通过脚本化工具和低代价实验收紧反馈循环，直接盯着失败案例而非只看损失曲线，有目的地游走多个子领域以找到个人独特性优势，并找到能告诉你想法很糟糕的同伴。文章大量引用 Hamming、Schulman、Feynman、Darwin、Karpathy、Andrew Ng 等人的具体方法论，强调可操作的实践（如预测实验结果、过拟合单个 batch、手动分析百个失败案例），而非抽象建议。适合希望从表面模仿转向实质生产力的研究工程师和博士生，尤其适用于机器学习与系统工程交叉领域的一线实践者。

原文 10 分钟

原文 x.com ↗

§ 1

nobody really teaches you research. you get a desk, a problem someone else picked, and a vague instruction to produce something novel. so most people reverse-engineer the job from what they can see, which is papers, threads, and announcements, and what they end up learning is how to look like a researcher rather than how to be one. the actual skill is a stack of smaller skills, and almost every one of them can be deliberately trained.

没有人真正教你如何做研究。你得到一张办公桌、一个别人选定的问题，以及一条模棱两可的指令——产出一些新颖的东西。于是大多数人从他们能看到的东西来反向推导这份工作：论文、帖子、公告。他们最终学会的是如何看起来像个研究者，而不是如何成为研究者。真正的技能是一系列更小技能的堆叠，而几乎每一项都可以刻意训练。

§ 2

richard hamming had a habit at bell labs that made him unpopular at lunch. he'd ask whoever sat near him what the important problems in their field were, then ask why they weren't working on them. people changed tables. the question stings because most of us have no good answer. we don't choose problems, we absorb them, from an advisor, from whatever a big lab announced last quarter, from the paper everyone is quote-tweeting this week.

the trouble with an absorbed problem is that you hold the conclusion without the reasoning. you know some famous lab cares about a direction, but not why, not what they expect to find, not what would make them drop it. when they pivot, you find out a year later. and on a problem that's already fashionable, you're racing a thousand people who started earlier and have more compute than you.

Richard Hamming 在贝尔实验室有一个习惯，这让他午餐时不受欢迎。他会问邻座的人：你们领域的重要问题是什么？然后追问：你们为什么不做？人们纷纷换桌。这个问题之所以刺人，是因为我们大多数人都没有好答案。我们不是选择问题，而是吸收问题——从导师那里，从某个大实验室上一季度的公告里，从这周大家都在转发的论文中。

吸收来的问题有个麻烦：你只知道结论，却不知道背后的推理。你知道某个知名实验室在关注某个方向，却不知道原因、他们的预期发现、以及什么会让他们放弃。当他们转向时，你要一年后才得知。而在一个已经很时髦的问题上，你在和一千个更早起步、算力更多的人赛跑。

§ 3

john schulman's guide to ml research splits the work into two modes. in one, you read the literature and hunt for things to improve. in the other, you choose an outcome you genuinely want to exist and reason backwards to the experiments. he argues for the second, and the quiet reason is that it manufactures originality. a goal you actually care about will drag you into territory no survey paper covers.

John Schulman 的机器学习研究指南将工作分为两种模式。一种是阅读文献，寻找可以改进的东西。另一种是选择你真正希望存在的结果，然后逆向推理出实验。他主张第二种模式，而背后的原因是，它能制造出原创性。一个你真正在意的目标，会把你拖入任何综述论文都未曾覆盖的领域。

§ 4

taste, meanwhile, gets discussed like a gift. it behaves more like a muscle. predict the result of every experiment before you run it. cover a paper's results section and guess the numbers from the method alone. mark down which of this month's releases will matter in two years and check your hit rate later. a forecast plus a correction, repeated a few hundred times, is how every good model gets trained, including the one in your head.

与此同时，品味常被当作天赋来讨论。但它更像一块肌肉。在运行每个实验之前预测结果。遮住论文的结果部分，仅凭方法猜测数据。记下本月发布的成果中，哪些在两年后仍会重要，稍后检查你的命中率。预测加校正，重复几百次——这就是每个好模型的训练方式，包括你脑袋里的那个模型。

§ 5

shared reading lists produce shared ideas. if your information diet is the trending page of arxiv plus whatever survives the group chat filter, you will reliably reach the same conclusions as everyone else, at the same time, which makes those conclusions worth approximately nothing.

共享的阅读清单产生共享的想法。如果你的信息食谱是 arxiv 的热门页面加上群聊过滤后剩下的内容，你会可靠地和别人同时得出相同的结论，这让那些结论价值几乎为零。

§ 6

old material is criminally underpriced. this field reruns its own past on a delay: mixture of experts dates to 1991, lstms to 1997, backprop went mainstream in 1986. rich sutton needed about a thousand words in 2019 to write the bitter lesson, and it predicts the shape of the field better than surveys ten times its length. claude shannon gave a talk on creative thinking in 1952 where his opening move was to shrink a problem until it's nearly trivial, crack the small version, then reintroduce the difficulty one piece at a time. that single trick will carry you through more walls than any modern productivity advice.

旧材料的价格低得离谱。这个领域延迟地重演自己的历史：混合专家模型可以追溯到 1991 年，LSTM 到 1997 年，反向传播在 1986 年成为主流。Rich Sutton 在 2019 年用大约一千字写出了《苦涩的教训》，它对领域形态的预测比十倍的综述更好。Claude Shannon 在 1952 年做了一次关于创造性思维的演讲，他的开场是把问题缩小到几乎琐碎，破解小版本，然后逐步重新引入难度。仅这一个技巧就能帮你突破的障碍，比任何现代生产力建议都多。

§ 7

range matters as much as depth. interpretability borrows shamelessly from neuroscience. eval design is mechanism design wearing a lab coat. a working sense of how gpus actually move memory tells you which architecture papers are doomed before the benchmarks do. and honest statistics might be the rarest skill in ml, where a lot of published rigor is vibes with error bars.

广度与深度同样重要。可解释性毫不客气地从神经科学借力。评估设计是穿着白大褂的机制设计。对 GPU 实际如何移动内存的务实理解，能让你在基准测试之前就看出哪些架构论文注定失败。而诚实的统计学可能是机器学习中最稀缺的技能，因为很多已发表的严谨性，不过是带着误差线的感觉。

§ 8

one more thing. read the paper itself, not the thread summarizing it. the appendix is where the bodies are buried, and the limitations section is usually the most honest paragraph in the document.

还有一件事。读论文本身，而不是读总结它的帖子。附录里埋着尸骨，而限制部分通常是文档中最诚实的一段。

§ 9

paul graham points out that an idea can feel fully formed right up until you try to put it into words. the page finds gaps your head papers over: the assumption you never tested, the step that doesn't actually follow, the two claims that quietly contradict each other.

Paul Graham 指出，一个想法在你试图用文字表达之前，可能感觉已经完全成形。纸张会找到你大脑所掩盖的漏洞：你从未测试过的假设，实际上不成立的步骤，悄悄相互矛盾的两个主张。

§ 10

feynman's rule was that the first person you must avoid fooling is yourself, because you're the easiest target. writing is the cheapest defense ever invented. darwin went further and made it procedural. any fact that cut against his theory got written down on the spot, because he'd caught his own memory deleting inconvenient evidence faster than the convenient kind. your memory does the same thing to your failed runs. keep a log: hypothesis, setup, expectation, result, updated belief. rereading last month's entries is humbling in a way no reviewer can match.

费曼的规则是，你必须首先避免愚弄自己，因为你是最容易的目标。写作是迄今发明的最廉价的防御。达尔文更进一步，将其程序化。任何与他理论相悖的事实都会被立即记下，因为他发现自己的记忆删除不利证据的速度比删除有利证据更快。你的记忆对你的失败实验也是如此。记录日志：假设、设置、预期、结果、更新后的信念。重读上个月的记录，其谦卑作用没有哪位审稿人能做到。

§ 11

then put some of it in public. olah and carter's research debt essay makes the case that fields choke on undigested ideas, and that a clear explanation is a genuine contribution rather than a service job. a lot of people working in interpretability today found the field through readable posts, not conference papers. a body of public writing also doubles as the strongest credential you can hold, because it's an unfakeable sample of how you think.

然后把其中一部分公开发布。Olah 和 Carter 的“研究债务”文章指出，领域会因未被消化的想法而窒息，清晰的解释本身就是真正的贡献，而不是服务性工作。今天很多从事可解释性研究的人，是通过可读的帖子而不是会议论文发现这个领域的。公开写作的积累还能充当你能拥有的最强背书，因为它是一份无法伪造的思维样本。

§ 12

the stories about alec radford rarely involve a single stroke of genius. they involve volume. more runs per day, more wrong ideas discarded per week, a model of reality that updated faster than anyone else's. that's the actual game. research speed is mostly the speed at which you discover you're wrong.

关于 Alec Radford 的故事很少涉及灵光一现。它们涉及的是量。每天更多实验，每周更多错误想法被舍弃，一个比任何人都更快更新的现实模型。这才是真正的游戏。研究速度很大程度上就是你发现自己错了的速度。

§ 13

which makes tooling a first-class research activity. launching a run should be one command. plotting it should be one more. every experiment should be reproducible from its config, and comparing two runs should take seconds, not an afternoon of archaeology. karpathy's recipe for training neural networks has a step that pays for itself a hundred times over: overfit a single batch before training at scale. thirty seconds, half your bugs, gone. shrink everything until it's cheap, get it right, then spend the compute.

这使工具成为一等研究活动。启动一次实验应该是一个命令。绘制结果应该再加一个。每个实验都应可以通过其配置重现，比较两次实验应该只需几秒钟，而不是一整个下午的考古。Karpathy 的训练神经网络配方有一个步骤，其回报是百倍：在规模化训练之前，先过拟合一个 batch。三十秒，一半的 bug 就消失了。把一切缩小到廉价，做对，然后再投入算力。

§ 14

and retire the idea that engineering is the junior partner here. at the frontier the two jobs have fused. the researcher who can build the harness, the eval, and the data pipeline is the one whose hypotheses actually get tested. everyone else is waiting in a queue.

放弃工程是二等公民的观念。在前沿，这两项工作已经融合。能构建框架、评估和数据管道的研究者，其假设才能真正得到检验。其他人都在排队等待。

§ 15

a descending loss curve is not analysis, it's reassurance. your experiments throw off far more information than you consume: transcripts, failure cases, the strange tail of the distribution. most of it dies unread in a logs folder.

下降的损失曲线不是分析，而是安慰剂。你的实验释放出的信息远比你消费的多：转录、失败案例、分布中奇怪的尾部。大部分都未被读取地死在了日志文件夹里。

§ 16

karpathy's recipe starts before any training code gets written, with hours spent on the raw data by hand. most ml bugs live in the data, and they fail silently. nothing crashes. you simply get a mediocre model and a wrong theory about why.

Karpathy 的配方从任何训练代码编写之前就开始，花几个小时手工处理原始数据。大多数机器学习 bug 都藏在数据里，并且悄悄失败。不会崩溃。你只会得到一个平庸的模型和一个关于原因的错位理论。

§ 17

andrew ng has taught the same unglamorous move for over a decade because nothing beats it. pull a hundred failures, read all of them, sort them into piles, attack the biggest pile. it works on models and it works on evals, where a benchmark you've never read transcripts from is a benchmark you don't actually understand. one transcript of genuinely strange behavior will teach you more than the next decimal of accuracy ever will.

Andrew Ng 十多年来一直在教同一个不起眼的技巧，因为没有什么能胜过它。取出一百个失败案例，全部阅读，分类，然后攻击最大的那一堆。它对模型有效，对评估也有效——你从未读过转录的基准测试，就是一个你并未真正理解的基准测试。一份真正奇怪行为的转录教给你的，比下一个百分点的准确率还要多。

§ 18

your first subfield is an accident of timing, so treat it like one. spend real time in interpretability, in evals, in rl, in systems, before deciding where you live. somewhere in this field is a corner where your specific weirdness is an unfair advantage, and the only way to locate it is to pay tuition in several places. nobody waives the tuition.

你的第一个子领域是时机的意外，所以请这样对待它。在决定定居之前，花些时间在可解释性、评估、强化学习、系统上。在这个领域的某个角落，你独特的怪异之处是一个不公平的优势，而找到它的唯一方法是支付几个地方的学费。没人会免掉学费。

§ 19

run the disposable version of every idea first and let most of them die young. tune your baselines until it hurts, because the graveyard of ml is full of gains that evaporated against a properly tuned baseline, and a reviewer is the worst possible person to learn that from. ablate until you know which component carries the result. it's usually one, and it's usually not the one in the title.

先运行每个想法的可丢弃版本，让大多数在早期就死掉。调优你的基线直到它让你难受，因为机器学习的坟场里充满了在恰当调优的基线面前蒸发的增益，而审稿人是学习这一点的最坏人选。做消融实验，直到你知道哪个组件承载了结果。通常只有一个，而且通常不是标题里那个。

§ 20

breadth is also insurance. subfields saturate, all of them, usually right after they peak on twitter. the people who keep producing through those transitions are the ones who already know their way around the neighboring territory.

广度也是保险。子领域都会饱和，所有子领域——通常是在它们在 Twitter 上达到顶峰之后。那些在过渡期仍能持续产出的人，是那些已经熟悉相邻领域的人。

§ 21

hamming noticed a pattern in who ended up doing important work. colleagues with closed office doors got more done in any given year, and colleagues with open doors did the work that mattered, because the interruptions carried information about what the world actually needed. your open door is probably an inbox. keep it that way.

Hamming 注意到，那些最终做出重要工作的人有一个模式。关着办公室门的同事在任一年里都完成得更多，而开着门的同事则做了重要的工作，因为打扰带来了关于世界真正需要什么的信息。你的开门可能就是收件箱。保持这种状态。

§ 22

generosity compounds in research like nothing else. replicate a result and publish what you find. release the tool you built for yourself. explain something hard in plain language. the returns arrive sideways, months later, as the collaboration or the reference or the role you couldn't have applied for. float your half-formed ideas in public too, because being wrong on the timeline is far cheaper than being wrong in print. and the collaborator who tells you an idea is bad before you sink three months into it is worth more than compute. that relationship can't be bought, only earned.

在研究领域，慷慨的复利效应无与伦比。复现一个结果并发布你的发现。发布你为自己构建的工具。用平实的语言解释难懂的东西。回报会斜向而来，几个月后，以合作、引用或你本无法申请的角色形式出现。也把你的半成型想法公之于众，因为在时间线上犯错远便宜于在出版物里犯错。而那个在你投入三个月之前就告诉你某个想法很糟糕的合作者，其价值超过算力。这种关系买不来，只能挣来。

§ 23

pasteur said luck favors the prepared mind, and hamming built a whole career philosophy on top of it: knowledge and productivity compound like interest. the daily edges look trivial in isolation. what you read, what you record, how fast your loop runs, who you argue with. give them a few years and they produce careers that look like luck from the outside. start compounding earlier than feels necessary. future you already knows this was the cheap part.

巴斯德说过，运气偏爱有准备的头脑。Hamming 以此为基础构建了一整套职业哲学：知识和生产力像利息一样复利增长。日常的小优势单独看来微不足道——你读什么、你记录什么、你的循环多快运行、你和谁争论。给它们几年时间，它们会造就出从外部看起来像运气一样的职业生涯。在感觉必要之前就早点开始复利。未来的你已经知道，这是最便宜的部分。

打开原文 ↗

标签

AI 工程职业建议实验设计研究方法论

读完这条，下一步

→ ai-engineering → research-methodology

术语

ablate · 消融: 在机器学习实验中系统性移除或禁用模型的一个组件（如一个层、一个特征），以测量其对最终结果的具体贡献
overfit a single batch · 对单个批数据过拟合: Karpathy 建议的调试方法：用极端少量数据训练，确保模型能先完全拟合它，从而在训练前就暴露出 bug，而非在完整数据集上跑长时间训练
baseline tuning · 基线调优: 在提出新方法前，花大量精力把简单基准模型的超参数和配置调到极优，因为许多自封的创新会在强大的 baseline 面前消散
research debt · 研究债务: Olah 和 Carter 提出的概念：领域因大量未经消化和清晰解释的想法而积压，阻碍进步；写一篇清晰易懂的解释就是一项真正的贡献