Daily /2026-07-01 / Browser Automation CLI for AI Agents

Browser Automation CLI for AI Agents

Source github.com Glean’d 2026-07-01 06:00 Read 64 min

AI summary

agent-browser is a native Rust CLI designed for AI agents to automate browser interactions. It uses a client-daemon architecture where the Rust daemon directly communicates with Chrome via CDP, eliminating the Node.js dependency. The tool offers a comprehensive command set covering navigation, element interaction (via ref/CSS/XPath/text selectors), snapshots, screenshots, network interception, session management, and authentication state persistence. It includes built-in safety features like domain allowlists, action policies, and encrypted state storage. It is optimized for AI workflows with accessibility tree snapshots, annotated screenshots, and MCP server support, making it ideal for engineers building AI agents, automated testing, web scraping, or enabling LLMs to control browsers reliably.

Original · 64 min

github.com ↗

§ 1

Agent Browser is a fast, native Rust CLI for browser automation, designed specifically for AI agents. It provides a comprehensive set of commands for web navigation, interaction, data extraction, and debugging, all through a simple command-line interface. It manages a headless Chromium browser under the hood, offering an alternative to browser automation frameworks like Playwright or Puppeteer, but optimized for agentic workflows.

Agent Browser 是一个用原生 Rust 编写的快速浏览器自动化 CLI，专门为 AI 代理（agent）设计。它提供了一套全面的命令，用于网页导航、交互、数据提取和调试，所有这些都通过一个简单的命令行界面完成。它在底层管理一个无头（headless）Chromium 浏览器，是 Playwright 或 Puppeteer 等框架的替代方案，但针对代理工作流进行了优化。

§ 2

Traditional browser automation tools like Playwright require writing code in a programming language and managing complex async flows, which is not ideal for LLM-based agents that need to perform tasks through simple, stateless commands. Agent Browser solves this by exposing a flat CLI interface where each interaction (clicking, filling, reading) is a single command. It also handles browser lifecycle automatically, starting a daemon on the first command and keeping it alive for subsequent fast operations, eliminating per-command startup overhead.

传统的浏览器自动化工具如 Playwright 需要用编程语言编写代码并管理复杂的异步流程，这对于需要通过简单、无状态命令执行任务的基于大语言模型（LLM）的代理来说并不理想。Agent Browser 通过提供一个扁平的 CLI 界面解决了这个问题，每个交互（点击、填写、读取）都是一个单一命令。它还自动处理浏览器生命周期，在第一个命令时启动一个守护进程（daemon），并在后续操作中保持其存活以快速执行，消除了每个命令的启动开销。

§ 3

The project's key innovation is its snapshot and ref workflow. Running agent-browser snapshot generates an accessibility tree where each interactive element is assigned a deterministic ref like @e1, @e2. These refs can then be used in other commands (click @e1, fill @e2 "text") without complex CSS or XPath selectors. This is ideal for LLMs, which can parse the tree output, identify the target element, and interact using the simple ref identifier. Annotated screenshots further enhance this by overlaying numbered labels on elements that correspond to their refs.

该项目的关键创新在于其快照（snapshot）和引用（ref）工作流。运行 agent-browser snapshot 会生成一个无障碍树（accessibility tree），其中每个交互元素都被分配一个确定的引用，如 @e1、@e2。然后可以在其他命令（click @e1、fill @e2 "text"）中使用这些引用，无需复杂的 CSS 或 XPath 选择器。这对 LLM 来说非常理想，它可以解析树输出、识别目标元素并使用简单的 ref 标识符进行交互。带有注释的截图（annotated screenshots）通过在元素上覆盖与它们的 ref 相对应的编号标签进一步增强了这一点。

§ 4

Agent Browser uses a client-daemon architecture written in Rust. The CLI parses commands and communicates with a long-running Rust daemon, which interacts directly with the browser via the Chrome DevTools Protocol (CDP). The daemon starts automatically on the first command and manages browser sessions, tabs, and state. No Node.js or Playwright is required for the daemon itself. The project also supports multiple browser engines (Chrome, Lightpanda) and cloud providers (Browserless, Browserbase, Kernel, AgentCore) for environments where a local browser is not available.

Agent Browser 使用用 Rust 编写的客户端-守护进程架构。CLI 解析命令并与长期运行的 Rust 守护进程通信，该守护进程通过 Chrome DevTools 协议（CDP）直接与浏览器交互。守护进程在第一个命令时自动启动，并管理浏览器会话、标签页和状态。守护进程本身不需要 Node.js 或 Playwright。该项目还支持多种浏览器引擎（Chrome、Lightpanda）和云提供商（Browserless、Browserbase、Kernel、AgentCore），适用于没有本地浏览器可用的环境。

§ 5

Installation is straightforward via npm (npm install -g agent-browser && agent-browser install), Homebrew, or Cargo. After downloading Chrome with agent-browser install, the core workflow is: agent-browser open <url> to navigate, agent-browser snapshot -i to see interactive elements with refs, and then commands like agent-browser click @e2 or agent-browser fill @e3 "text" to interact. Other essential commands include agent-browser read for fetching agent-friendly text, agent-browser screenshot, and agent-browser close.

安装非常简单，可以通过 npm（npm install -g agent-browser && agent-browser install）、Homebrew 或 Cargo 进行。使用 agent-browser install 下载 Chrome 后，核心工作流是：agent-browser open <url> 导航，agent-browser snapshot -i 查看带有引用的交互元素，然后使用 agent-browser click @e2 或 agent-browser fill @e3 "text" 等命令进行交互。其他基本命令包括用于获取代理友好文本的 agent-browser read、截图命令 agent-browser screenshot 和关闭命令 agent-browser close。

§ 6

This tool is ideal for AI agents needing to automate web tasks: testing login flows, scraping data, monitoring dashboards, or interacting with web apps. It is particularly strong in agentic-coding environments (like Claude Code or Cursor) where a simple CLI is preferred. A significant caveat is that the snapshot command returns the accessibility tree, which may not capture all visual states or dynamic content rendered via canvas, WebGL, or Shadow DOM. For complex visual tasks, the --annotate screenshot feature helps but is currently only available on the CDP-backed browser path. The tool also requires a Chrome installation, although cloud providers offer an alternative.

该工具非常适合需要自动化网页任务的 AI 代理：测试登录流程、抓取数据、监控仪表盘或与 Web 应用交互。它在代理编码环境（如 Claude Code 或 Cursor）中尤其强大，因为在这些环境中更喜欢简单的 CLI。一个重要的注意事项是，snapshot 命令返回无障碍树，这可能无法捕获通过 canvas、WebGL 或 Shadow DOM 渲染的所有视觉状态或动态内容。对于复杂的视觉任务，--annotate 截图功能有所帮助，但目前仅适用于基于 CDP 的浏览器路径。该工具还需要安装 Chrome，但云提供商提供了替代方案。

Open source ↗