Glean 拾遗
Daily /2026-06-16 / Decomposing the agent harness into swappable workers: the iii engine architecture

Decomposing the agent harness into swappable workers: the iii engine architecture

Source x.com Glean’d 2026-06-16 06:01 Read 20 min
AI summary

Mike Piccolo argues that monolithic agent frameworks force a tradeoff by bundling the loop, tools, memory, and orchestration into one block, which long-running teams inevitably rewrite. He walks through the iii engine's production worker stack, where all thirteen harness responsibilities—credential resolution, policy checks, turn FSM, session persistence, budget tracking, etc.—are decomposed into 11 independently replaceable workers. Each worker connects to the engine via WebSocket and registers functions and triggers using a single primitive (iii.trigger()), making the harness a composable set of installable workers. The post provides a step-by-step trace of a turn through provisioning, streaming, policy-gated tool dispatch, and reactive approval wake-ups, alongside concrete examples of swapping the model catalog, adding a provider, or integrating a Slack approval surface. The core bet: an agent harness should be a slider of composable workers rather than a framework you fork. This is for backend engineers building or scaling custom agent infrastructure who are hitting the composability limits of existing frameworks.

Original · 20 min
x.com ↗
§ 1

Most agent teams don't build a harness. They adopt one. LangChain, LangGraph, OpenAI Agents SDK, Anthropic SDK, CrewAI, AutoGen, the loop, the tools, the memory, and the orchestration are picked off the shelf as a single decision. The harness is a framework you import. If something inside it doesn't fit, you fork it, fight it, or work around it.

大多数智能体团队并不自己构建 harness,而是直接采用现成的框架。LangChain、LangGraph、OpenAI Agents SDK、Anthropic SDK、CrewAI、AutoGen……循环、工具、记忆、编排——都作为一个整体决策从货架上拿下来。Harness 就是你 import 的框架。如果内部某部分不合适,你只能 fork、对抗或绕过它。

§ 2

I think that shape is wrong, and it's the reason every long-running agent team eventually ends up rewriting its harness from scratch. The harness isn't one thing. It's ten or twelve different things bundled together because the surrounding ecosystem doesn't give you a way to compose them. Pi agent packages are on the right track, but they are still in the paradigm of “Add another service and integrate it with all others.” The iii engine treats all workers the same and removes the integration logic completely. The provider router, the credential vault, the policy engine, the approval gate, the model catalog, the session storage, the budget tracker, the after-call hook fanout, and the durable turn loop are independent concerns. These are all interoperable with your queue, http/api server, streaming, even browser workers. A framework that ships them as one block is selling you a tradeoff you didn't have to make.

我认为这种形态是错误的,这也是为什么每个长期运行的智能体团队最终都会从头重写其 harness。Harness 不是单一事物,而是十二个不同组件的捆绑包——仅仅因为周边生态没有提供组合它们的方式。Pi agent 包的方向是对的,但依然停留在“加一个服务,再跟其他服务集成”的范式。iii 引擎将所有 worker 一视同仁,完全消除了集成逻辑。Provider 路由、凭据库、策略引擎、审批网关、模型目录、会话存储、预算跟踪器、调用后 hook 扇出、持久化 turn 循环——这些都是独立的关注点。它们都可以与你的队列、HTTP/API 服务器、流式、甚至浏览器 worker 互操作。一个把它们打包成一块的框架,是在向你兜售你本不必做的取舍。

§ 3

The bet underneath iii is that they shouldn't be one block. There should be a set of workers on a shared engine, each replaceable, each versioned independently, each connected by a single primitive: a trigger (iii.trigger()) that every other worker also uses. The harness becomes a stack of installable workers, and "build your own" stops meaning "fork a framework." It means "swap a few workers."

iii 背后的赌注是:它们不应该是一个整体。应该是一组运行在共享引擎上的 worker,每个都可替换、独立版本化,通过同一个原语——trigger(iii.trigger())连接,其他所有 worker 也都使用这个原语。Harness 变成了一组可安装的 worker,“自己构建”不再是“fork 一个框架”,而是“替换几个 worker”。

§ 4

If you strip a production agent harness back to its responsibilities, you get a list that looks roughly like this:

  1. Accept a turn request from a client and persist it
  2. Resolve credentials for whichever model provider gets called
  3. Look up what the chosen model can actually do (vision, tools, streaming, context window)
  4. Drive the per-turn state machine, provision, stream assistant, run tools, steer, tear down
  5. Load and serve skill bodies that describe each function's request shape, error codes, and usage notes
  6. Assemble the system prompt, mode paragraph, identity preamble, working directory, and default skills appendix
  7. Stream tokens back to the client as the model produces them
  8. Check every tool call (that’s just a function) against a policy before it runs
  9. Pause tool calls that need a human decision and route the answer back to the right turn
  10. Track LLM spend against per-workspace or per-agent budgets
  11. Run hooks before and after tool calls (logging, redaction, custom side effects)
  12. Persist the session as a branching tree so forks and resumes work
  13. Compact session history when the context window fills up
  14. Emit an event stream that the UI subscribes to
  15. Missing piece from every agent's company building, I see. Carry one OpenTelemetry trace across every step so you can debug it

Every serious agent harnesses most of these. The expensive ones do all of them. The cheap ones cut corners and rebuild the corners later when they hit production. The frameworks bundle them into a monolith and ship one version of each. That last part is the part that costs you, because a year in, you find out that the policy engine you want is not the policy engine the framework ships, and replacing it means replacing the harness.

如果你把一个生产级 agent harness 拆解成其职责,大致会得到这样一份列表:

  1. 接受来自客户端的 turn 请求并持久化
  2. 为被调用的模型 provider 解析凭据
  3. 查询所选模型的实际能力(视觉、工具、流式、上下文窗口)
  4. 驱动每个 turn 的状态机:预置、流式输出 assistant、运行工具、引导、拆除
  5. 加载并提供 skill 体(描述每个函数的请求结构、错误码和使用说明)
  6. 组装系统提示、模式段落、身份前导、工作目录和默认技能附录
  7. 在模型生成 token 时,将 token 流回客户端
  8. 在执行每个工具调用前,根据策略进行检查
  9. 暂停需要人类决策的工具调用,并将结果路由回正确的 turn
  10. 按工作空间或 agent 跟踪 LLM 花费
  11. 在工具调用前后运行 hook(日志、编辑、自定义副作用)
  12. 将会话持久化为分支树,支持 fork 和恢复
  13. 上下文窗口填满时压缩会话历史
  14. 发出事件流供 UI 订阅
  15. 我看到每个智能体公司在构建中都缺失的一块:在每一步携带一个 OpenTelemetry trace,以便调试

每个严肃的 agent harness 都处理其中大部分职责。昂贵的方案全部实现,廉价的方案偷工减料,之后上线时再重新构建。框架则将它们全部打包成一个单体,每个组件只提供一个版本。最后这一点才是最昂贵的——因为一年后你会发现,你想要的策略引擎不是框架自带的那个,而替换它意味着替换整个 harness。

§ 5

The iii harness ships every one of those thirteen jobs as a separate worker on the workers.iii.dev registry. Each speaks the same WebSocket protocol. Each registers functions and triggers on the same engine bus. Each is iii worker add-able, swappable, and writable in any language with an SDK.

iii harness 将上述十三项工作中的每一项都作为独立 worker 发布在 workers.iii.dev 注册表中。每个 worker 都使用相同的 WebSocket 协议,在同一个引擎总线上注册函数和触发器。每个都可以通过 iii worker add 添加、替换,并可以用任何带有 SDK 的语言编写。

§ 6

Here is the actual production stack from the iii-hq/workers monorepo, with each worker's job in one line. The whole bundle ships at github.com/iii-hq/workers/harness:

Eleven workers. One engine. Each is on a published version. Each is independently runnable as a standalone process (pnpm dev:<worker> in dev, iii worker add <specific-worker> as a release binary) or as part of the composite entry point that spins them up together.

The reason this matters: every box in that table is a place where someone can hand you a different worker, and you keep the rest. Don't like the static model catalogue? Plug in a worker that registers models::list and reads from a live API. Don't like file-backed credentials? Plug in a worker that registers auth::get_token and reads from a secrets manager. Want a different turn FSM for a workflow that branches differently? Replace turn-orchestrator, every dependent calls run::start and reads turn_state through the same bus, so the rest of the stack doesn't change.

以下来自 iii-hq/workers monorepo 的实际生产栈,每个 worker 的职责用一行概括。完整包在 github.com/iii-hq/workers/harness:

十一个 worker,一个引擎。每一个都有已发布的版本,既可以作为独立进程运行(开发时 pnpm dev:<worker>,发布时 iii worker add <specific-worker>),也可以作为组合入口点的一部分一起启动。

其重要性在于:表中的每个框都是别人可以交给你另一个 worker 的地方,而你保留其余部分。不喜欢静态模型目录?插入一个注册了 models::list 并从实时 API 读取的 worker。不喜欢文件支持的凭据?插入一个注册了 auth::get_token 并从密钥管理器读取的 worker。想要一个不同的 turn 有限状态机来处理不同分支的工作流?替换 turn-orchestrator,所有依赖方都通过同一个总线调用 run::start 并读取 turn_state,因此栈的其他部分无需改变。

§ 7

The shape of one turn looks like this, walking through the workers in the order they fire.

A browser/cli/chat POSTs a turn through harness::trigger with {session_id, message_id, payload}. The harness meta-worker forwards payload to run::start. That hop exists so the OpenTelemetry span wrapper can seed the session and message IDs as baggage, which propagates to every nested iii.trigger call across every worker in the stack. The trace tree on the other side is one connected graph.

run::start lands on the turn-orchestrator. It persists the run request, seeds the initial TurnStateRecord in iii state at session/<sid>/turn_state, and returns immediately. The actual work happens inside the durable per-state machine, woken by publishes to the turn-step FIFO.

The two terminal states are stopped (clean exit via finishSession()) and failed (an unexpected handler throw routes here, acks the queue so it stops retrying, and surfaces message_complete{stop_reason:'error'} plus agent_end so the UI shows the reason). Teardown is an inline finishSession() port called from any turn-end path, not a separate enqueued step.

provisioning does three things. It boots a iii-sandbox microVM if the run needs isolated execution. It calls directory::skills::download for every namespace in system_default_skills (default ["iii://iii-directory/index"]) so iii-directory pre-caches the skill bodies the run starts with. And it assembles the system prompt in three layers: a mode paragraph picked from run_request.mode (plan, ask, or agent), the iii identity preamble that teaches the model the agent_trigger convention and the directory::skills::get on-demand discovery pattern, and an appended index of the default skills the agent boots with. The caller can override the whole prompt by passing system_prompt on run::start; otherwise the orchestrator builds it. Function schemas come from the live engine catalog.

assistant_streaming calls provider::<name>::stream on whichever provider worker matches the run's provider field. The provider worker pulls credentials via auth::get_token (auth-credentials), streams the model's SSE response into an iii channel, and the orchestrator drains that channel emitting message_update events on agent::events for the UI fanout. Channel creation and the read loop live behind a pull-based MessagePump in provider-stream.ts, so the streaming state stays focused on transitions.

When the assistant returns tool calls, the FSM enters function_execute. Every tool call passes through dispatchWithHook, the single chokepoint in the orchestrator. consultBefore calls policy::check_permissions directly with a 5-second timeout. The policy worker (the harness meta-worker, in the default stack) reads iii-permissions.yaml, matches the call's function_id against the rule set, and returns one of three outcomes:

  • allow: dispatch proceeds; the orchestrator triggers the target function and writes the result
  • deny: dispatch short-circuits with a DenialEnvelope, the result becomes a denial record
  • needs_approval: the individual call parks into the turn's awaiting_approval list. The rest of the batch keeps dispatching. The turn transitions to function_awaiting_approval only when one or more entries are pending.

The approval wake is reactive and shared. The orchestrator registers exactly one turn::on_approval state trigger on scope approvals. When the console calls approval::resolve, the approval-gate worker writes approvals/<sid>/<cid> = {decision, reason} to iii state. That write fires turn::on_approval, which advances the affected session. function_awaiting_approval reads only the decisions that just landed, dispatches each one as it arrives (allow becomes a pre-approved dispatch, deny or aborted becomes a synthetic denial), and advances when awaiting_approval[] is empty. No per-call resume functions to register. No startup re-scan to recover pending approvals. One trigger covers every session.

Fail-closed by construction: if the policy worker is unreachable or the 5-second timeout fires, consultBefore denies the call with a gate_unavailable envelope. If iii::durable::publish itself errored, the hook fanout returns publish_failed: true and the orchestrator treats it as a deny.

A few latency wins fall out of this shape. The after-function-call hook short-circuits publish_collect via a subscriber-presence cache when no durable subscriber is registered for the topic, removing roughly 500ms per executed function call. tearing_down is inlined into finishSession(), removing one durable queue hop per turn. context-compaction subscribes to a dedicated agent::turn_end stream the orchestrator emits at turn boundaries, so compactor wakeups are per-turn instead of per-event. The session-create fanout state trigger gates by scope alone and matches in-process, so the previous per-write harness::session::is_create_event RPC is gone.

After the batch completes, steering_check decides whether to continue, stop, or hit max_turns. If continue, loop back to assistant_streaming. If stop or max, finishSession() runs inline: emit agent_end, free the sandbox, transition to stopped.

一次 turn 的形态如下,按照 worker 触发的顺序依次展开。

浏览器/CLI/聊天通过 harness::trigger 提交一个 turn,携带 {session_id, message_id, payload}。Harness meta-worker 将 payload 转发给 run::start。这个跃点的存在是为了让 OpenTelemetry span 包装器能把 session 和 message ID 作为 baggage 种下,并传播到栈中每个 worker 的每个嵌套 iii.trigger 调用。最终生成的 trace 树是一个完整的连通图。

run::start 到达 turn-orchestrator。它持久化运行请求,在 session/<sid>/turn_state 种下初始的 TurnStateRecord,并立即返回。实际工作在持久化的每个状态机内部完成,由发布到 turn-step FIFO 来唤醒。

两个终端状态是 stopped(通过 finishSession() 正常退出)和 failed(意外的 handler 抛出会路由到这里,ack 队列停止重试,并发出 message_complete{stop_reason:'error'}agent_end,让 UI 显示原因)。Teardown 是一个内联的 finishSession() 端口,从任何 turn 结束路径调用,而不是单独的入队步骤。

Provisioning 做三件事:如果需要隔离执行,启动一个 iii-sandbox 微虚拟机;对 system_default_skills 中的每个命名空间调用 directory::skills::download(默认是 ["iii://iii-directory/index"]),让 iii-directory 预缓存运行开始时所需的 skill 体;组装系统提示,分为三层:从 run_request.mode(plan、ask 或 agent)中选择的模式段落、教导模型 agent_trigger 约定和 directory::skills::get 按需发现模式的 iii 身份前导、以及 agent 启动时默认技能的附录索引。调用者可以通过在 run::start 上传递 system_prompt 来覆盖整个提示;否则由 orchestrator 构建。函数 schema 来自实时引擎目录。

assistant_streaming 调用与运行的 provider 字段匹配的 provider worker 上的 provider::<name>::stream。Provider worker 通过 auth::get_token(auth-credentials)获取凭据,将模型的 SSE 响应流式传输到 iii channel 中,orchestrator 消耗该 channel,在 agent::events 上发出 message_update 事件供 UI 扇出。Channel 创建和读取循环位于 provider-stream.ts 中基于拉取的 MessagePump 之后,因此流式状态保持专注于状态转换。

当 assistant 返回工具调用时,FSM 进入 function_execute。每个工具调用都经过 dispatchWithHook——orchestrator 中的单一检查点。consultBefore 直接调用 policy::check_permissions,超时时间为 5 秒。策略 worker(默认栈中的 harness meta-worker)读取 iii-permissions.yaml,将调用的 function_id 与规则集匹配,返回三种结果之一:

  • allow:继续调度;orchestrator 触发目标函数并写入结果
  • deny:调度短路,返回 DenialEnvelope,结果成为拒绝记录
  • needs_approval:单个调用进入 turn 的 awaiting_approval 列表。批处理中的其余部分继续调度。只有当有一个或多个条目待处理时,turn 才转换为 function_awaiting_approval

审批唤醒是反应式和共享的。Orchestrator 在 scope approvals 上只注册一个 turn::on_approval 状态触发器。当控制台调用 approval::resolve 时,approval-gate worker 将 approvals/<sid>/<cid> = {decision, reason} 写入 iii 状态。这次写入触发 turn::on_approval,从而推进受影响的会话。function_awaiting_approval 只读取刚刚落地的决策,并按到达顺序调度每个决策(allow 成为预批准调度,deny 或 aborted 成为合成拒绝),当 awaiting_approval[] 为空时继续推进。无需注册每个调用的恢复函数,无需启动时重新扫描来恢复待处理审批。一个触发器覆盖所有会话。

默认失败关闭:如果策略 worker 不可达或 5 秒超时触发,consultBefore 用一个 gate_unavailable 信封拒绝调用。如果 iii::durable::publish 本身出错,hook 扇出返回 publish_failed: true,orchestrator 将其视为拒绝。

这种形态还带来了一些延迟优化:当没有注册的持久化订阅者时,函数调用后的 hook 通过订阅者存在性缓存短路 publish_collect,每个执行的函数调用减少了约 500ms;tearing_down 内联到 finishSession() 中,每 turn 减少一次持久化队列跃点;context-compaction 订阅 orchestrator 在 turn 边界发出的专用 agent::turn_end 流,因此压缩器唤醒是按 turn 而非按事件进行;会话创建扇出状态触发器仅按 scope 门控并进行进程内匹配,因此消除了之前每次写入时的 harness::session::is_create_event RPC。

批次完成后,steering_check 决定继续、停止或达到 max_turns。如果继续,循环回 assistant_streaming。如果停止或达到上限,finishSession() 内联运行:发出 agent_end、释放沙箱、转换为 stopped。

§ 8

The interesting part is that none of the workers above are special. Each one is a process that opens a WebSocket to the engine, registers some functions and triggers, and runs. The contract is the same as the contract every application worker uses. The harness is built on the same primitive your business logic is built on.

Which means "build your own harness" decomposes into the same operation as "write any worker." You pick the layer you want to replace, you write a worker that registers the same functions on the bus, you iii worker add it, and the rest of the stack starts using your worker.

有趣的是,上述 worker 中没有一个特殊。每个都是一个进程,打开到引擎的 WebSocket,注册一些函数和触发器,然后运行。这个契约与每个应用 worker 使用的契约相同。Harness 建立在与你业务逻辑相同的原语之上。

这意味着“构建你自己的 harness”分解为与“编写任何 worker”相同的操作:选择要替换的层,编写一个在总线上注册相同函数的 worker,通过 iii worker add 添加它,然后栈的其余部分开始使用你的 worker。

§ 9

Five concrete examples.

Replace the model catalogue with a live API. Write a worker that registers models::list, models::get, models::supports. Have it fetch from your provider's catalog endpoint every N minutes and cache. Publish it. iii worker add your-org/dynamic-models-catalog. Stop the static models-catalog worker. The turn-orchestrator never knows the difference. It calls iii.trigger('models::list') and the engine routes to whichever worker registered that function id most recently.

Add a new provider. The shape is provider-kimi and provider-lmstudio already prove out. Each is one worker that registers provider::<name>::stream and provider::<name>::complete, drains an SSE stream from the upstream API into an iii channel, and writes its model usage to llm-budget via budget::record. Adding a fifth provider is writing one folder with one iii.worker.yaml and one register.ts. Publish to the registry, or keep it local. The turn-orchestrator picks the provider by the run's provider field; new providers become available the instant the worker connects.

Serve skills from a private artifact store. Write a worker that registers directory::skills::get and directory::skills::list, backed by your internal docs system or a private S3 bucket. Disconnect or rename the default iii-directory worker. The orchestrator's bootstrap calls directory::skills::download per namespace; your worker answers. The agent's "fetch the per-function skill before calling a new function" pattern keeps working unchanged because the wire shape is the same.

Override the system prompt entirely. run::start accepts an optional system_prompt field. Pass it and the orchestrator uses your string verbatim, skipping the mode paragraph + identity preamble + skills appendix assembly. Useful when you have an existing prompt asset you want the harness to honour without modification. Skill download still runs in bootstrap, so the agent keeps directory::skills::get on-demand discovery even with a custom prompt.

Replace the approval gate UI surface. The default approval-gate worker registers approval::resolve. The wire schema is one function call:

iii.trigger('approval::resolve', {
  session_id: '...',
  function_call_id: '...',
  decision: 'allow' | 'deny' | 'aborted',
  reason: 'optional human text',
})

The handler persists approvals/<sid>/<cid> = {decision, reason} to iii state. The orchestrator's single turn::on_approval state trigger picks that write up and wakes the right session. If you want to drive approvals from Slack instead of the console, write a Slack worker that listens for /approve <id> and /deny <id> slash commands, then calls approval::resolve with the right payload. The orchestrator never knows the difference. The whole approval-gate worker stays untouched. You added a new worker; you didn't replace the existing one.

If you want a different policy engine (OPA, Cedar, your own DSL), write a worker that registers policy::check_permissions and returns { decision, rule_id?, matched_constraint? }. Disconnect the default policy worker (which is wrapped inside the harness meta-worker, so you'd disable that handler or run a stripped-down meta-worker). The turn-orchestrator's consultBefore doesn't know the difference. Same 5-second timeout, same fail-closed semantics, same wire shape.

五个具体示例:

将模型目录替换为实时 API。编写一个注册 models::listmodels::getmodels::supports 的 worker,每隔 N 分钟从 provider 的目录端点获取并缓存。发布后执行 iii worker add your-org/dynamic-models-catalog,然后停止静态的 models-catalog worker。Turn-orchestrator 完全不知道区别——它调用 iii.trigger('models::list'),引擎路由到最近注册该函数 ID 的 worker。

新增一个 providerprovider-kimiprovider-lmstudio 已经证明了这种形态。每个都是一个 worker,注册 provider::<name>::streamprovider::<name>::complete,将上游 API 的 SSE 流排入 iii channel,并通过 budget::record 将模型使用量写入 llm-budget。添加第五个 provider 就是编写一个文件夹,包含一个 iii.worker.yaml 和一个 register.ts。发布到注册表或保留本地。Turn-orchestrator 根据运行的 provider 字段选择 provider;worker 连接后新 provider 立即可用。

从私有工件存储提供 skills。编写一个注册 directory::skills::getdirectory::skills::list 的 worker,由内部文档系统或私有 S3 存储桶支持。断开或重命名默认的 iii-directory worker。Orchestrator 的引导程序按命名空间调用 directory::skills::download;你的 worker 响应。Agent 的“在调用新函数之前获取每个函数的 skill”模式保持不变,因为连线形态相同。

完全覆盖系统提示run::start 接受一个可选的 system_prompt 字段。传入后,orchestrator 逐字使用你的字符串,跳过模式段落 + 身份前导 + skills 附录的组装。当你有一个希望 harness 不加修改地遵循的现有提示资产时很有用。Skill 下载仍在引导中运行,因此即使使用自定义提示,agent 也保持 directory::skills::get 按需发现。

替换审批网关的 UI 表面。默认的 approval-gate worker 注册了 approval::resolve。连线 schema 是一个函数调用:

iii.trigger('approval::resolve', {
  session_id: '...',
  function_call_id: '...',
  decision: 'allow' | 'deny' | 'aborted',
  reason: 'optional human text',
})

该处理器将 approvals/<sid>/<cid> = {decision, reason} 持久化到 iii 状态。Orchestrator 的单个 turn::on_approval 状态触发器捕获该写入并唤醒正确的会话。如果你想从 Slack 而不是控制台驱动审批,编写一个监听 /approve <id>/deny <id> 斜杠命令的 Slack worker,然后用正确的 payload 调用 approval::resolve。Orchestrator 完全不知道区别。整个 approval-gate worker 保持不变——你添加了一个新 worker,并没有替换现有的。

如果你想要不同的策略引擎(如 OPA、Cedar 或你自己的 DSL),编写一个注册 policy::check_permissions 并返回 { decision, rule_id?, matched_constraint? } 的 worker。断开默认策略 worker(它被封装在 harness meta-worker 内部,因此你需要禁用该 handler 或运行一个精简版的 meta-worker)。Turn-orchestrator 的 consultBefore 不知道区别。相同的 5 秒超时,相同的默认失败关闭语义,相同的连线形态。

§ 10

The classic harness debate frames itself as thin vs thick. Anthropic's thin loop versus LangGraph's explicit DAG. The framing assumes you pick one side and live with it.

When the harness is composed of workers on the same bus, thin vs thick is just a count of how many workers you install. A thin harness is turn-orchestrator plus provider-anthropic plus auth-credentials plus a minimal harness meta-worker. That's it. No approvals, no budgets, no policy engine, no hook fanout. Run anything. Trust the model. Useful for autonomous research agents, experimental loops, anything internal.

A thick harness is all thirteen workers plus context-compaction plus a custom policy worker plus a custom approval-gate plus a Slack-integrated approval surface plus the budget worker enforcing per-workspace caps. Useful for an agent running customer workflows where every tool call needs to be auditable and every model spend has to roll up to a finance dashboard.

The architectural distance between thin and thick isn't a rewrite. It's a config change. Same wire protocol, same trace shape, same observability story. The slider moves by adding and removing workers from your config.yaml. Everything else holds.

经典的 harness 争论是“薄”与“厚”——Anthropic 的薄循环 vs LangGraph 的显式 DAG。这种框架假设你选择一边并接受它。

当 harness 由同一总线上的 worker 组成时,薄与厚只是安装 worker 数量的区别。薄 harness 就是 turn-orchestrator 加上 provider-anthropic 加上 auth-credentials 再加上一个最小的 harness meta-worker。仅此而已。没有审批、没有预算、没有策略引擎、没有 hook 扇出。运行任何东西,信任模型。适用于自主研究 agent、实验性循环或任何内部场景。

厚 harness 则是所有十三个 worker,再加上 context-compaction、自定义策略 worker、自定义审批网关、Slack 集成的审批界面、以及强制执行工作空间上限的预算 worker。适用于运行客户工作流的 agent,其中每个工具调用都需要审计,每个模型花费都需要汇总到财务看板。

薄与厚之间的架构距离不是一次重写,而是一次配置更改。相同的连线协议、相同的 trace 形态、相同的可观测性故事。滑块通过从你的 config.yaml 中添加和移除 worker 来移动。其他一切保持不变。

§ 11

It applies inside a single worker too. The turn-orchestrator just shipped a refactor that collapsed its FSM from eleven states to seven, deleted the per-call turn::approval_resume::<sid>/<cid> mechanism in favour of one reactive turn::on_approval state trigger on scope approvals, and inlined tearing_down into a finishSession() port. Every other worker in the stack (approval-gate, session, llm-budget, providers, models-catalog, auth-credentials, hook-fanout, context-compaction) stayed unchanged. The approval::resolve wire shape didn't move. The contracts held. That's the property the composition gives you: a major internal rewrite of one worker is a self-contained change because every neighbour talks to it through bus-level function ids.

This is the part the framework model can't give you. A framework picks a position on the slider for you and locks you in. The worker model leaves the slider in your hand.

这同样适用于单个 worker 内部。Turn-orchestrator 最近发布了一次重构,将其 FSM 从 11 个状态压缩到 7 个,删除了每个调用的 turn::approval_resume::<sid>/<cid> 机制,转而使用一个作用域在 approvals 上的反应式 turn::on_approval 状态触发器,并将 tearing_down 内联到 finishSession() 端口中。栈中所有其他 worker(approval-gate、session、llm-budget、providers、models-catalog、auth-credentials、hook-fanout、context-compaction)保持不变。approval::resolve 的连线形态没有改变。契约保持有效。这就是组合设计赋予你的特性:一个 worker 的重大内部重写是一个自包含的变更,因为每个邻居都通过总线级别的函数 ID 与它通信。

这正是框架模型无法提供的。框架在滑块上为你选择一个位置,然后将你锁定。Worker 模型则将滑块留在你手中。

§ 12

If you've been running an agent on top of a framework and feeling the same boundary problems most teams hit at scale, the answer is probably not "rewrite the harness in our own framework." The policy engine doesn't extend the way you need. The approval UI is wired into the framework's chat surface. The credential store can't talk to your secrets manager. The budget tracker is in a sidecar database the trace can't see. The answer is to switch to a substrate where the harness is decomposed in the first place.

The fastest way to feel the argument is to clone github.com/iii-hq/workers, pnpm install, pnpm build, and run the composite entry point. You'll get the full fourteen-worker harness pointed at an iii engine. You can disable any worker by removing its entry from the boot list. You can swap any worker by writing a replacement that registers the same function ids. You can extend any worker by adding a subscriber to its hook topics. hook-fanout::publish_collect is the generic every iii hook builds on.

The docs live at iii.dev/docs. The engine is at github.com/iii-hq/iii. The worker registry is at workers.iii.dev. The harness bundle is at github.com/iii-hq/workers/harness.

如果你一直在框架之上运行 agent,并感受到大多数团队在规模上遇到的相同边界问题,答案可能不是“用我们自己的框架重写 harness”。策略引擎不能按你需要的方式扩展,审批 UI 被绑定在框架的聊天界面上,凭据存储无法与你的密钥管理器通信,预算跟踪器位于 trace 看不见的 sidecar 数据库中。答案是转向一个 harness 一开始就被分解的基座。

最快感受这一点的方法是 clone github.com/iii-hq/workers,运行 pnpm install && pnpm build,然后运行组合入口点。你将获得完整的十四 worker harness,指向一个 iii 引擎。你可以通过从启动列表中移除任何 worker 的条目来禁用它,通过编写一个注册相同函数 ID 的替代品来替换它,通过为它的 hook 主题添加订阅者来扩展它。hook-fanout::publish_collect 是每个 iii hook 构建于其上的通用原语。

文档在 iii.dev/docs,引擎在 github.com/iii-hq/iii,worker 注册表在 workers.iii.dev,harness 包在 github.com/iii-hq/workers/harness。

§ 13

A harness is not a thing you install. A harness is a set of jobs your system has to do for an agent to run durably, safely and observably. The framework era bundled those jobs together because nothing underneath gave you a way to compose them.

iii's bet is that one primitive: a worker that connects to the engine over WebSocket and registers functions and triggers is small enough to absorb every one of those jobs separately, and that the resulting stack is more useful than any framework because every layer is independently replaceable.

You don't adopt the iii harness. You install the workers you want, write the ones you need, and end up with a harness shaped exactly like your system. Same protocol on every layer. Same trace across every call. Same iii worker add for the parts you take from the registry as for the parts you publish yourself.

That's what "build your own agent harness" looks like when the substrate is the right shape. Pick the workers. Write the missing ones. Compose. The harness is the composition.

Harness 不是你安装的东西。Harness 是你的系统为了让 agent 持久、安全、可观测地运行而必须完成的一系列工作。框架时代将这些工作捆绑在一起,因为底层没有任何东西提供组合它们的方式。

iii 的赌注是:一个原语——一个通过 WebSocket 连接到引擎并注册函数和触发器的 worker——小到足以单独吸收这些工作中的每一项,并且由此产生的栈比任何框架都更有用,因为每一层都可独立替换。

你不是“采用”iii harness。你安装你想要的 worker,编写你需要的 worker,最终得到一个形状与你的系统完全一致的 harness。每一层相同的协议,每一次调用相同的 trace,无论是从注册表获取的组件还是你自己发布的组件,都使用相同的 iii worker add 命令。

这就是当基座形状正确时,“构建你自己的 agent harness”的样子:选择 worker,编写缺失的部分,组合。Harness 就是组合本身。

Open source ↗