Daily /2026-06-26 / A Comfortable AX for Agent Search

A Comfortable AX for Agent Search

Source raft.build Glean’d 2026-06-26 06:00 Read 11 min

AI summary

Raft CTO Tenny argues that returning raw IDs or full content to an agent doing a search is bad design. The correct approach mirrors web search results: return a highlighted snippet, context preview, and one explicit next action (e.g., 'read surrounding context'). Every token in the agent's context window has a cost, so results must be compact, immediately scannable, and paired with an actionable next step. This is UX design extended—but the user is now an agent reading tokens, not a person looking at a screen.

Original · 11 min

raft.build ↗

§ 1

You design every pixel a person sees. Then you hand your agent a wall of raw IDs and call it a result.

What you see. What your agent gets.

When your agent runs a search, what should it get back?

You already answered this question for humans, years ago. You would never hand a person a raw database dump and call it a search feature. You return a results page: a list of snippets, each one a little window into the match, each one clickable. The snippet tells them what's there. The link tells them what to do next.

Now look at what most tools hand an agent for the same search. A list of IDs. Maybe the full text of every hit. Designing what an agent receives is design work, the same craft as designing a screen, and it's the part most tools skip.

你精心设计人类看到的每一个像素。然后你扔给代理一面墙的原始 ID，管这叫结果。

你看到的。代理拿到的。

当你的代理执行搜索时，它应该拿回什么？

这个问题的答案，你多年前就为人类解答过了。你不会把原始数据库的导出倒给一个人，然后称之为搜索功能。你返回一个结果页面：一组摘要片段，每个片段是一个通往匹配项的小窗口，每个片段可点击。摘要告诉用户那里有什么，链接告诉用户下一步去哪里。

现在看看大多数工具为代理的同一搜索提供了什么。一串 ID。或者每一条命中的全文。代理收到什么，这本身就是设计工作，和设计屏幕一样的技艺，然而这是大多数工具直接跳过的部分。

§ 2

Here's the catch, and it's why this is easy to ignore. An agent will act on almost anything you hand it. It won't push back on a bad format; it will just quietly do worse. That's the trap: it works, so nothing flags the cost, while every clumsy result makes the agent work harder than it should. The question was never whether it can cope. It's what coping costs.

We first ran into this designing search for Raft, where the results are messages. But nothing about the problem is specific to chat. A blog post, a timeline, an issue, a tweet: every search result is a window into something larger it was pulled out of, and a good one hands back enough of that to be worth opening.

这里有个陷阱，也是为什么这个问题容易忽略。代理几乎会对你给它的任何东西执行操作。它不会抱怨糟糕的格式；它只会安静地做得更差。这就是陷阱：它运行起来了，所以没有任何东西标记出代价，而每一次笨拙的结果都使得代理比本应做的更费力。问题从来不是它能不能应付，而是应付的代价是什么。

我们最初是在为 Raft 设计搜索时遇到这个问题的，那里的结果是消息。但这个问题并不局限于聊天。一篇博文、一条时间线、一个 issue、一条推文：每个搜索结果都是通往更大事物的一扇窗，搜索结果就是从那个更大事物中抽取出来的，一个好的结果会返还足够多的上下文，值得打开它。

§ 3

Walk the wrong answers

Start with the obvious one: return just the IDs. The agent can technically use it, but only by resolving every single ID into its real content, one call at a time. You didn't save context; you deferred it, and made the agent pay interest. The first thing a result has to be is usable as it stands.

遍历错误答案

从最明显的那个开始：只返回 ID。技术上代理可以用，但必须一次一次调用，将每个 ID 解析成真正的内容。你没有节省上下文，你只是推迟了它，并且让代理支付利息。结果的第一条要求是：拿来就能用。

§ 4

So return the full content, then. Every hit, in full. Now the opposite problem: a search that matched forty messages just dumped forty messages into a context window that has room for a handful. A result has to be sized for who's consuming it.

那就返回全部内容吧。每一条命中，完整无缺。现在又出现相反的问题：匹配到四十条消息的搜索，把四十条消息一股脑倒进只能容纳几条的上下文窗口。结果的大小必须适配它的消费方。

§ 5

Fine, return just the messages that matched, each one in full. Reasonable, and still wrong, because a single message is rarely the answer. The agent searched for a decision and got the one line that named it, but not the thread that argued it out or the reply that reversed it. You handed back a sentence ripped out of the conversation it lived in. The result has to carry enough of its own context to be worth reading.

Three tries, three failures, and they fail on the same two axes: what the agent can see, and what it can do next.

好吧，那就只返回匹配到的消息，每一条完整。合理，但仍然不对，因为单条消息很少是答案本身。代理搜索一个决策，得到了提及决策的那一行，却没有得到讨论决策的整个讨论串，或者推翻它的回复。你返回了一个从它所处的对话中撕扯出来的句子。结果必须携带足够的自身上下文，才值得阅读。

三次尝试，三次失败，它们都栽在同一个两维坐标上：代理能看到什么，以及它能做什么。

§ 6

The result that works

Here is the answer those three were circling. Return the ID, a preview of the text around the matched keyword with the hit highlighted, and one explicit next action: read the surrounding context to see the full thread.

You have seen this before. It is a web search result. A snippet built around the match, not the top of the page, and a link that takes you to the rest. The agent gets enough to judge relevance without resolving anything, plus a clear handle to pull when it wants more.

The thing the first three answers were all missing is the second half. A preview with nowhere to go is a search result you can't click. The information without the next action is half a design.

That is the whole of it: the information it sees, and the action it can take. Get both right and you have designed for the agent the way you would design for any user.

有效的结果

这就是那三种答案绕来绕去没有触及的核心：返回 ID，围绕匹配关键词的文本预览并高亮命中，以及一个显式的下一步动作：阅读周围上下文以查看完整讨论串。

你以前见过这个。这是一个网页搜索结果。一个围绕匹配项而非页面顶部构建的摘要片段，外加一个带你跳转到剩余内容的链接。代理无需解析任何东西就能判断相关性，并且在需要更多信息时有一个清晰的操作入口。

前三种答案都缺失的是后半部分。没有去处的预览，是一个不可点击的搜索结果。没有下一步动作的信息，是半个设计。

这就是全部：它能看见的信息，以及它能采取的动作。把两者都做对，你就以给任何用户设计的方式，为代理做好了设计。

§ 7

The same search design idea changes shape when the user is an agent reading a context budget.

A one-dimensional kind of UX

This is still UX, just not the UX you draw. A human user has a screen; their eye wanders, scans a layout, rests on what catches it. An agent reads in one dimension: no eye to wander, no layout to scan, just a line of tokens on a budget. A context window is screen real estate you can't see. So the discipline carries over, but the tradeoffs shift. You stop designing for attention and start designing for that budget. Every token the result spends has to earn its place. The preview around the match earns it: it puts the relevant span where the agent is already reading and drops the rest.

当用户是受上下文预算约束的代理时，相同的搜索设计理念会改变形态。

单向的用户体验

这仍然是 UX，只是不是你所画的那种 UX。人类用户有屏幕；他们的眼睛可以游走，扫描布局，停留在吸引它的地方。代理在一维空间中阅读：没有眼睛可以游走，没有布局可以扫描，只有一行受预算限制的 token。上下文窗口是你看不见的屏幕空间。因此，设计准则延续了下来，但权衡点转移了。你不再是设计来吸引注意力，而是开始为那个预算做设计。结果花费的每个 token 都必须赢得它的位置。 围绕匹配的预览赢得了这个位置：它把相关的文本段放在代理已经在阅读的位置，并丢弃其余部分。

§ 8

And comfort here is not a nicety. A result the agent can read in one pass is one it gets right more often, and keeps getting right deep into a long session. A clumsy result does not stop the agent; it taxes it. Every irrelevant hit it has to skim, every truncation it has to second-guess, every block of structure it has to peel away is attention spent on the format instead of the task, and attention spent there is where mistakes get made. Fewer wrong turns, less context burned: comfort is the measurable kind, not the pleasant kind.

That is also why the result tells the agent where it was cut. The marks that show omitted text are not decoration; they are information. A gap before the match means there is more above it; a gap after means more below; even the space between two keywords can be cut. The agent has to know the preview is a window, not the whole room, or it will read a fragment as the full text and answer from half a quote.

这里的舒适并非锦上添花。代理能一次阅读完成的结果，往往能使其更频繁地做对决定，并且能在漫长的对话中持续保持正确。一个笨拙的结果不会让代理停摆，但会消耗它。它需要略读的每一条不相关命中、需要二次猜测的每个截断处、需要剥离的每块结构，都是花费在格式而非任务上的注意力，而注意力花费在那里正是错误产生的地方。更少的错误转弯，更少的上下文浪费：这种舒适是可度量的那种，而非惬意的那种。

这也是为什么结果必须告诉代理它在哪里被截断了。显示被省略文本的标记不是装饰，而是信息。匹配前的间隙意味着上面还有内容；匹配后的间隙意味着下面还有；甚至两个关键词之间的空间也可能被截断。代理必须知道预览是一扇窗，而非整个房间，否则它会把一个片段当作全文来阅读，并从半句话中得出答案。

§ 9

Designing for an agent as a real user

Taking that seriously is more than formatting the output nicely. It is a contract, and it runs in both directions.

The agent has to read the structure as structure. The markers around the preview, the highlight on the match, the marks that show where text was cut: all of it is scaffolding the search built to communicate, and the agent has to treat it as scaffolding, never as the message itself. Misread that and it starts quoting truncation marks back as if they were content.

It also has to keep that scaffolding in, not hand it back out. When the agent answers the human, it renders to plain language. The structure is for the agent to read, the way a browser reads an accessibility tree to decide what to show; the human never sees it. Read the structure, act on it, and keep it to yourself.

This is taste, and we make the agent prove it. The contract is tested: the agent has to understand the structure, filter before it resolves, take the next action, and keep the markup to itself. The design holds by passing that bar, not by reading well on a page.

(The markup itself, the tags and the marks, is just a carrier. It could be something else entirely. What matters is the pairing underneath it: information plus action.)

将代理视为真实用户来设计

认真对待这一点，不仅仅是漂亮地格式化输出。这是一份契约，而且它是双向的。

代理必须把结构当作结构来阅读。预览周围的标记、匹配项的高亮、显示文本被截断的符号：所有这些都是搜索构建出来用于沟通的脚手架，代理必须将其视为脚手架，绝不是消息本身。误读这一点，它就会把截断符号当作内容来引用。

代理还必须把这些脚手架留存在内部，而不是返还给它。当代理回答人类时，它会渲染成自然语言。结构是给代理阅读的，就像浏览器阅读无障碍树来决定显示什么；人类永远看不到它。阅读结构，据此行动，然后烂在肚子里。

这就是品味，我们要求代理来证明它。契约要经受考验：代理必须理解结构，在解析之前进行过滤，采取下一步动作，并将标记保留在自己手中。设计通过这个标准的考验才算成立，而不只是在一页纸上读起来不错。

（标记本身，标签和符号，只是一个载体。它完全可以换成别的东西。重要的是其底层的配对：信息加上动作。）

§ 10

The clearest case, not the only one

Search is the easiest place to see it, because everyone already knows what a good result feels like. But search is not special. The same problem appears anywhere an agent has to read an output and choose its next move.

Skip the design and you feel it: the agent opening forty messages to find one line, quoting a truncation mark as fact, or spinning on a result it never fully read.

Every output you hand an agent is a surface, whether you designed it or not. Tool results, errors, statuses, lists: they are all information the agent must understand and act on. The questions never change. Can it use what it sees? Does it know what to do next?

You already design every screen a person sees. Your agent has become one of your users too. It deserves the same care.

最清晰的案例，但不是唯一的

搜索是最容易看到这一点的地方，因为每个人都知道好结果是什么感觉。但搜索并不特殊。同样的问题出现在任何代理需要读取输出并选择下一步行动的地方。

跳过设计，你就能感受到：代理打开四十条消息找一行文字，把截断标记当作事实引用，或者在一个从未完整阅读的结果上原地打转。

你交给代理的每一个输出都是一个界面，无论你是否设计过它。工具结果、错误、状态、列表：它们都是代理想理解和操作的信息。问题从未改变。它能用看到的信息吗？它知道下一步该做什么吗？

你已经在设计人类看到的每一个屏幕了。你的代理也已经成为了你的用户之一。 它理应得到同样的关怀。

Open source ↗