Glean 拾遗
日刊 /2026-06-03 / 不止写代码:Codex 持久线程、目标验证与自动化全景

不止写代码:Codex 持久线程、目标验证与自动化全景

原文 x.com 收录 2026-06-03 06:00 阅读 12 min
AI 解读

本文展示如何将 Codex 从单一代码助理扩展为围绕持续性工作线程构建的多工具系统。读者将了解到:利用置顶线程与快捷键(Command-1~9)实现跨会话上下文保持;通过语音输入粗糙想法并交由 Agent 整理;使用中途干预(steering)和任务排队(queuing)在运行中调整方向;设置按心跳触发的线程自动化(如周期性检查 Slack/Gmail);以及定义带测试验证的长期目标(Goals)。此外,侧面板支持内联审阅各类制品,Obsidian 宝库作为共享记忆层记录跨线程决策。适合希望将 AI 助理深度融入日常工作流的工程师。

原文 12 分钟
原文 x.com ↗
§ 1

Getting the most out of Codex

Most developers first use coding agents for code: inspect a repository, make a diff, run tests, and open a pull request.

That’s still the center of gravity for Codex. But much of the work on a computer is already mediated by code: executing shell commands, browsing web pages, calling APIs, exporting documents, responding to events, and triggering automations. As those surfaces become available to Codex, it starts to feel less like a coding assistant in the narrow sense and more like a system for getting computer work done.

充分利用 Codex

大多数开发者最初使用编码代理来处理代码:审查仓库、生成差异、运行测试、提交拉取请求。

这仍然是 Codex 的重心。但计算机上的许多工作已经通过代码来调解:执行 shell 命令、浏览网页、调用 API、导出文档、响应事件、触发自动化。随着这些表面逐渐对 Codex 开放,它不再像狭义上的编码助手,而更像一个完成计算机工作的系统。

§ 2

The Codex app makes that shift concrete. A thread can keep context, use tools, surface artifacts, and continue across prompts instead of resetting after each exchange.

Getting more out of Codex means using these capabilities together:

  • durable threads that preserve context
  • voice, steering, and queuing while the user is still in the loop
  • browser, computer-use, MCP servers, and connectors that let Codex act beyond a repo
  • thread automations and Goals that continue the work while the user is away
  • the side panel, where users can review code, documents, decks, and other artifacts

Codex 应用让这一转变变得具体。一个线程可以保留上下文、使用工具、展示产物,并跨提示继续运行,而不是每次交互后重置。

充分利用 Codex 意味着将这些能力组合使用:

  • 持久线程,保留上下文
  • 语音、引导和排队,让用户仍处于循环中
  • 浏览器、计算机使用、MCP 服务器和连接器,让 Codex 超越仓库边界
  • 线程自动化和目标,让用户在离开时继续工作
  • 侧面板,供用户审查代码、文档、演示文稿和其他产物
§ 3

Durable threads: Long-running Codex threads that preserve working context across repeated sessions.

Pinned threads are one way to keep durable threads close at hand. They’re useful for recurring work streams such as:

  • a Chief of Staff thread
  • a release thread
  • a documentation review thread
  • a thread dedicated to external monitoring

These are persistent workspaces, not short chats. Codex can revisit them over time, preserving prior decisions, preferences, and working context that would otherwise need to be rebuilt from scratch.

Pinned-thread shortcuts make this practical. Command-1 through Command-9 jump directly into saved threads.

持久线程:长时间运行的 Codex 线程,跨重复会话保留工作上下文。

固定线程是随时保持持久线程的一种方式。它们对于重复性工作流非常有用,例如:

  • 参谋长线程
  • 发布线程
  • 文档审查线程
  • 专门用于外部监控的线程

这些是持久的工作空间,而非短暂聊天。Codex 可以随时间重新访问它们,保留之前的决策、偏好和工作上下文,否则这些需要从头重建。

固定线程快捷键使其实用化。Command-1 到 Command-9 可直接跳转到已保存的线程。

§ 4

Voice input is valuable because it captures the rough version of a thought before it’s compressed into polished prose.

Codex has built-in voice input. It works especially well for vague starting points that are natural to say but awkward to type:

I think someone named Ben mentioned this in Slack. I do not remember the details. Please go look.

For an agent that can search, gather context, and report back, that’s often enough.

It also works well for a two- or three-minute thought dump before the task is fully formed.

Transcripts work the same way. A raw meeting transcript or dictated planning note often provides better source material than a short summary because it preserves uncertainty, emphasis, and unfinished lines of thought.

语音输入很有价值,因为它能在思想被压缩成精炼的文字之前捕捉其粗略版本。

Codex 内置了语音输入。它对于模糊的起点尤为有效,这些起点适合口述但难以打字:

我记得有个叫 Ben 的人在 Slack 里提过这个。 我不记得细节了。 请去找找。

对于一个能够搜索、收集上下文并汇报的代理来说,这往往足够了。

在任务尚未完全成型时,它也适用于两到三分钟的思想倾泻。

转录文本同理。原始的会议记录或口述规划笔记通常比简短摘要更适合作为素材,因为它保留了不确定性、强调和未完成的思路。

§ 5

Voice becomes even more useful when paired with explicit control over an active task.

Steering: Interrupting an in-flight Codex task with new direction before the current step finishes.

Steering is useful when the agent is heading the wrong way and needs a correction before it finishes. During a website review, for example, the user can interrupt the work while annotating the surface in the side panel:

  • make this smaller
  • the spacing between these two elements feels off
  • this copy is wrong

Queuing: Adding work for Codex to do after the current step completes.

Queuing is different. It doesn’t interrupt the task in progress. It adds the next task to the line. A user might say:

Once the work is done, send the preview link to the reviewer in Slack.

Steering changes what Codex is doing now. Queuing changes what should happen next. Both keep the user close to the work while it’s unfolding.

当与对活动任务的显式控制结合时,语音变得更有用。

引导:在当前步骤完成前,用新方向打断正在进行的 Codex 任务。

当代理走错方向、需要在完成前纠正时,引导很有用。例如,在网站审查期间,用户可以在侧面板标注界面时中断工作:

  • 把这个缩小
  • 这两个元素之间的间距不对
  • 这段文案错了

排队:在当前步骤完成后,为 Codex 添加要做的工作。

排队不同。它不打断正在进行的任务。它将下一个任务添加到队列中。用户可以说:

工作完成后,将预览链接通过 Slack 发送给审阅者。

引导改变 Codex 现在做的事。排队改变接下来该发生什么。两者都让用户持续参与正在展开的工作。

§ 6

Once a thread has continuity, the next question is what it can act on. Codex can move outward in layers:

  • $browser for the in-app browser in the side panel, where Codex can inspect and annotate web surfaces
  • @chrome for signed-in browser state and Chrome-based workflows
  • @computer for work that only exists through a desktop GUI

$browser fits side-panel browser review. @chrome fits signed-in browser work that depends on the user’s Chrome context. @computer fits tasks that only exist through a desktop GUI.

MCP servers and connectors extend the same idea into the rest of a workflow. Slack, Gmail, and Calendar matter because many important tasks first appear as messages, inbox items, or scheduling problems before they ever become code.

Skills make repeated workflows reusable. Once a workflow proves useful, package it as a skill so Codex can run it again without relearning the routine from scratch.

一旦线程具有连续性,下一个问题是它能对什么进行操作。Codex 可以按层向外扩展:

  • $browser 用于侧面板中的应用内浏览器,Codex 可以检查和标注网页表面
  • @chrome 用于已登录的浏览器状态和基于 Chrome 的工作流
  • @computer 用于仅通过桌面 GUI 存在的工作

$browser 适用于侧面板浏览器审查。@chrome 适用于依赖用户 Chrome 上下文的已登录浏览器工作。@computer 适用于仅通过桌面 GUI 存在的任务。

MCP 服务器和连接器将同一想法扩展到工作流的其余部分。Slack、Gmail 和日历很重要,因为许多重要任务在成为代码之前,首先表现为消息、收件箱项或日程安排问题。

技能使重复工作流可复用。一旦某个工作流被证明有用,将其打包为技能,这样 Codex 无需从头重新学习就能再次运行。

§ 7

The Codex mobile app changes when the user has to be at the desk. A task can start on a Mac where the files, permissions, and local setup already live, then continue while the user checks in from a phone.

That matters in small moments. Someone can leave the desk while Codex runs a longer task, answer a question from outside, approve the next step, or redirect the thread before they get back. The local environment stays in place; the user doesn’t have to.

Codex 移动应用改变了用户必须坐在办公桌前的模式。任务可以在 Mac 上开始,那里已经有文件、权限和本地设置,然后当用户从手机查看时继续。

这在微小时刻很重要。当 Codex 运行较长时间的任务时,用户可以离开办公桌,回答外部问题,批准下一步,或在回来前重定向线程。本地环境保持不变,用户无需在场。

§ 8

Automations run Codex work on a schedule. Use a scheduled automation when the recurring job should start fresh from a workspace, such as a daily report or a regular repository check. Use a thread automation when the schedule should return to an active conversation with its running context.

Thread automations: Heartbeat-style recurring wake-up calls that return to the same Codex thread on a schedule.

Pinned threads are useful, but they still wait for the user to return. A thread automation can check on something every few minutes or every few hours, continue until it meets a condition, and adjust the cadence over time.

A Chief of Staff thread might run every 30 minutes:

Every 30 minutes, check Slack and Gmail for unanswered messages that need my attention. Help me prioritize what matters most. If someone asks me a question, research the answer as deeply as you can and draft a reply for me, but do not send it.

When the user returns, the expensive part of gathering context is often done. The human still decides what gets sent.

Thread automations also fit feedback loops. A thread automation can watch pull request comments, Google Docs comments, or Slack replies and keep the surrounding work moving while the user is away.

Consider an animation workflow where a reviewer shares a video in Slack. A thread automation can check the thread on a schedule, render an updated version when comments arrive, and reply in the same thread tagging the reviewer. If one integration can’t complete the final upload, desktop automation can finish the step through the GUI.

The loop spans Slack for feedback, the codebase for rendering, and desktop automation for the final upload.

自动化按计划运行 Codex 工作。当重复性工作应从工作空间全新开始时,使用计划自动化,例如每日报告或定期仓库检查。当计划应返回到带有运行上下文的活跃对话时,使用线程自动化。

线程自动化:心跳式的定期唤醒调用,按计划返回到同一 Codex 线程。

固定线程很有用,但它们仍在等待用户返回。线程自动化可以每隔几分钟或几小时检查某些内容,继续直到满足条件,并随时间调整节奏。

参谋长线程可能每 30 分钟运行一次:

每 30 分钟,检查 Slack 和 Gmail 中未回复的需要我关注的消息。 帮我确定什么最重要。 如果有人问我问题,尽可能深入地研究答案并为我起草回复,但不要发送。

当用户返回时,收集上下文的昂贵部分通常已经完成。人类仍然决定发送什么。

线程自动化也适用于反馈循环。线程自动化可以监控拉取请求评论、Google Docs 评论或 Slack 回复,并在用户离开时保持相关工作推进。

考虑一个动画工作流:审阅者在 Slack 中分享视频。线程自动化可以按计划检查线程,当评论到来时渲染更新版本,并在同一线程中回复并标记审阅者。如果一个集成无法完成最终上传,桌面自动化可以通过 GUI 完成步骤。

这个循环覆盖了用于反馈的 Slack、用于渲染的代码库,以及用于最终上传的桌面自动化。

§ 9

Goals are most powerful when the task has a real finish line that the agent can keep pushing toward. A weak goal is:

Goals: Longer-running Codex tasks with a finish line the agent can keep working toward over time.

Implement the plan in this Markdown file.

A stronger goal has a measurable success criterion.

For example, an engineer might migrate an internal tool from Python to Rust by setting up the new directory, defining the goal, and making the finish line explicit: the new implementation isn’t done until the unit tests pass.

A goal combines ongoing execution with a verifier. The user defines the outcome, the stopping condition, and the signal that says whether Codex is getting closer.

Useful verifiers include:

  • a test suite
  • a benchmark
  • a bug reproduction
  • a validation matrix
  • an end-to-end workflow that must keep passing

Ambition matters, but without verification it’s just a wish.

当任务具有一个代理可以持续推进的真实终点时,目标最为强大。一个弱的目标是:

目标:长期运行的 Codex 任务,具有一个代理可以随时间不断推进的终点。

实现此 Markdown 文件中的计划。

更强的目标具有可衡量的成功标准。

例如,工程师可能通过设置新目录、定义目标并使终点明确来将内部工具从 Python 迁移到 Rust:新的实现直到单元测试通过才算完成。

目标将持续执行与验证器相结合。用户定义结果、停止条件以及指示 Codex 是否正接近目标的信号。

有用的验证器包括:

  • 测试套件
  • 基准测试
  • 错误复现
  • 验证矩阵
  • 必须持续通过的端到端工作流

雄心很重要,但没有验证就只是空想。

§ 10

The side panel keeps the work beside the conversation that produced it. Instead of exporting an artifact and switching contexts, the user can review it in place. The output might be code, but it might also be a deck, a PDF, a browser page, a table, or another artifact created along the way.

It supports four jobs especially well:

  1. Inspect artifacts
  2. Annotate what needs to change
  3. Operate web surfaces
  4. Review changes

The side panel lets users review Markdown, spreadsheets, data tables, documents, and slides in place. They can inspect, mark up, and revise artifacts without breaking the loop.

The deck or PDF can stay open beside the thread that produced it, ready for direct review and repair.

The in-app browser lets Codex inspect a rendered page, control it, and respond to annotations directly on the surface under review. Comments on a page or artifact stay inside the working loop instead of becoming a separate handoff.

The web becomes both output and control surface. Codex can build an artifact, open it in the side panel, inspect it, debug it, and keep refining the same object in place.

These surfaces work especially well:

  • index.html for lightweight static artifacts
  • Storybook for UI review
  • Remotion Studio for programmatic animation
  • browser-based slide decks for presentations
  • data apps for analysis workflows

A single index.html file can become a durable interactive artifact with no server required. Thread automations can also refresh static artifacts over time so a thread has something new waiting when the user returns.

侧面板将工作置于产生它的对话旁边。用户无需导出产物并切换上下文,而是直接就地审查。输出可能是代码,但也可能是演示文稿、PDF、浏览器页面、表格或其他沿途创建的产物。

它尤其擅长四项工作:

  1. 检查产物
  2. 标注需要更改的内容
  3. 操作网页表面
  4. 审查变更

侧面板让用户就地审查 Markdown、电子表格、数据表、文档和幻灯片。他们可以检查、标记和修改产物,而不打断工作循环。

演示文稿或 PDF 可以保持在生成它的线程旁,随时准备直接审查和修复。

应用内浏览器让 Codex 检查渲染页面、控制它并直接在被审查的表面上响应标注。页面或产物上的评论保留在工作循环内,而不是变成单独的手动交接。

网络成为既是输出也是控制表面。Codex 可以构建产物,在侧面板中打开它,检查、调试并持续在原地优化同一对象。

以下表面尤其有效:

  • index.html 用于轻量级静态产物
  • Storybook 用于 UI 审查
  • Remotion Studio 用于程序化动画
  • 基于浏览器的幻灯片用于演示
  • 数据应用用于分析工作流

单个 index.html 文件可以成为无需服务器的持久交互式产物。线程自动化也可以随时间刷新静态产物,这样当用户返回时线程就有新的东西等待。

§ 11

Long-running threads become more useful when they share memory outside any one conversation.

Shared memory: Durable context stored outside a single thread so future work can resume from something explicit and reviewable.

One durable pattern is to anchor persistent threads in an Obsidian vault. In practice, that means a folder of plain files that stays straightforward to inspect, edit, move, and keep for a long time. Teams can store that folder in cloud storage, Git, Dropbox, Google Drive, or another sync layer that fits their workflow.

A vault might look like this:

vault/ ├── TODO.md ├── people/ ├── projects/ ├── agent/ └── notes/

At the top level, AGENTS.md can define how Codex should update that workspace as it learns more about people, projects, decisions, and open loops.

Don’t copy one exact vault structure. Teach the agent where durable context should live, what context to preserve, and when not to create churn.

A practical AGENTS.md might say:

  • Treat ~/vault as durable work memory.
  • Prefer canonical notes over note sprawl.
  • Route TODOs, people, projects, daily summaries, and scratch notes explicitly.
  • Preserve decisions, blockers, owners, dates, and useful links.
  • If nothing meaningful changed, do not churn the vault.

Repositories hold code. The vault holds rolling context: the people involved, what changed, what’s blocked, what needs follow-up, and what would otherwise disappear between sessions.

Important context shouldn’t live only inside a conversation transcript. Write it down somewhere the next thread can pick back up.

Codex also has first-party memory features in Settings > Personalization > Memories. They provide a local recall layer for preferences, recurring workflows, and known pitfalls. They complement explicit written context rather than replacing it. Chronicle pushes in the same direction by helping Codex build memory from recent screen context.

当长时间运行的线程在单个对话之外共享内存时,它们变得更加有用。

共享内存:存储在单个线程之外的持久上下文,以便未来的工作可以从明确且可审查的内容重新开始。

一个持久的模式是将持久线程锚定在 Obsidian 仓库中。实际上,这意味着一个纯文本文件目录,直截了当,易于检查、编辑、移动并长期保存。团队可以将该目录存储在云存储、Git、Dropbox、Google Drive 或其他适合其工作流的同步层中。

仓库可能如下所示:

vault/ ├── TODO.md ├── people/ ├── projects/ ├── agent/ └── notes/

在顶层,AGENTS.md 可以定义 Codex 在了解更多关于人员、项目、决策和未结事项时如何更新该工作空间。

不要复制确切的仓库结构。教会代理持久上下文应存放在何处,哪些上下文需要保留,以及何时不制造混乱。

实用的 AGENTS.md 可能这样说:

  • 将 ~/vault 视为持久工作记忆。
  • 优先使用规范笔记,避免笔记蔓延。
  • 明确路由待办事项、人员、项目、每日总结和暂记笔记。
  • 保留决策、阻碍项、负责人、日期和有用链接。
  • 如果没有有意义的变化,不要搅动仓库。

仓库存放代码。仓库存放滚动上下文:涉及的人员、变化、阻碍、需要跟进的内容,以及否则会在会话间消失的信息。

重要的上下文不应仅存在于对话记录中。将其写下来,以便下一个线程可以继续。

Codex 还在设置 > 个性化 > 记忆中提供了第一方记忆功能。它们为偏好、重复工作流和已知陷阱提供了本地回忆层。它们补充显式书面上下文,而非取代它。Chronicle 通过帮助 Codex 从最近的屏幕上下文构建记忆,朝同一方向发展。

§ 12

Codex still starts from code. But more of the work around code is now reachable through the same system: MCP servers, browser surfaces, desktop controls, thread automations, and reviewable artifacts.

That changes the control model. Steering interrupts the work in progress. Queuing lines up the next task. Thread automations keep a thread active when the user steps away. Goals add a concrete finish line that Codex can keep working toward.

Codex can now carry a workflow from instruction to execution to artifact review, even when the work leaves the repo.

Codex 仍然从代码开始。但围绕代码的更多工作现在通过同一系统可达:MCP 服务器、浏览器表面、桌面控制、线程自动化和可审查产物。

这改变了控制模型。引导中断正在进行的工作。排队安排下一个任务。线程自动化在用户离开时保持线程活跃。目标增加了 Codex 可以持续推进的具体终点。

Codex 现在可以将工作流从指令到执行再到产物审查,即使工作离开了仓库。

打开原文 ↗