Daily /2026-07-04 / Self-Healing Browser Harness That Lets LLMs Drive Any Real Browser

Self-Healing Browser Harness That Lets LLMs Drive Any Real Browser

Source github.com Glean’d 2026-07-04 06:00 Read 7 min

AI summary

Browser Harness is a thin, self-healing CDP harness that connects an LLM directly to a real Chrome browser via a single WebSocket, with zero intermediate layers. When the agent needs to perform an action it hasn't seen before (e.g., file upload, cross-origin iframe interaction, drag and drop), it writes the missing helper code on the fly and saves it into an agent-workspace for reuse. The core package is roughly 1K lines, enabling complete freedom for browser automation tasks. Aimed at developers who need AI agents to perform real, unconstrained browser interactions.

Original · 7 min

github.com ↗

§ 1

Browser Harness is a thin, editable CDP (Chrome DevTools Protocol) harness that connects an LLM directly to your real browser. It provides a single WebSocket channel to Chrome with no intermediaries, allowing the agent to write missing helpers during execution and improve the harness with every run. The project is built for tasks where you need complete freedom—no scripted steps, no fixed selectors—just the agent figuring out what to do and how to do it.

Browser Harness 是一个轻量、可编辑的 CDP（Chrome DevTools Protocol）工具，它让 LLM 直接连接到你的真实浏览器。通过一条直连 Chrome 的 WebSocket 通道，没有任何中间层，代理可以在执行过程中动态编写缺失的辅助函数，并在每次运行中自我改进。这个项目专为需要完全自由的浏览器任务而设计——没有预设的脚本步骤，没有固定的选择器，一切由代理自己决定做什么和怎么做。

§ 2

Traditional browser automation (Playwright, Puppeteer) relies on pre-written scripts with hardcoded selectors and flows. When an LLM needs to interact with a web page—like uploading a file, filling a complex form, or navigating a multi-step checkout—the script must be updated every time the page changes. Browser Harness solves this by letting the agent write the missing helper on the fly. The harness itself is less than 1,000 lines of code, and the agent can edit its own workspace files (agent_helpers.py and domain skills) to adapt to new scenarios. This makes the system self-healing: the more it runs, the smarter it gets.

传统的浏览器自动化工具（如 Playwright、Puppeteer）依赖预先编写的脚本，里面写死了选择器和流程。当 LLM 需要与网页交互——比如上传文件、填写复杂表单、或完成多步结账——页面一旦变化，脚本就得跟着改。Browser Harness 的做法是让代理在执行时动态写出缺失的辅助代码。整个工具不到 1000 行，代理可以随时编辑自己的工作区文件（agent_helpers.py 和领域技能），从而适应新场景。这让系统拥有了自修复能力：跑得越多，学得越精。

§ 3

The harness's core is a thin CDP wrapper that opens a WebSocket to Chrome's remote debugging endpoint. The agent can call any CDP command directly, but more importantly, it can write and reuse Python helpers in agent_helpers.py and domain-specific skills in agent-workspace/domain-skills/. When the agent encounters a new task—like uploading a file to a certain site—it figures out the selectors and flow, then saves that knowledge as a browser skill. Next time, the skill is already there. The environment variable BH_DOMAIN_SKILLS=1 enables domain skills during execution. Skills are not hand-authored; they are generated by the agent from actual runs, ensuring they reflect what really works in the live browser.

该工具的核心是一个极简的 CDP 封装，通过 WebSocket 连接到 Chrome 的远程调试端点。代理可以直接调用任何 CDP 命令，但更重要的是，它可以在 agent_helpers.py 中编写并复用 Python 辅助函数，以及在 agent-workspace/domain-skills/ 中生成领域特定的技能。当代理遇到新任务——比如向某个网站上传文件——它会自行摸索出选择器和操作流程，然后将这些知识保存为浏览器技能。下次再遇到同样的任务，技能已经就绪。环境变量 BH_DOMAIN_SKILLS=1 可在执行时启用领域技能。这些技能并非手动编写，而是由代理从实际运行中生成，确保它们反映的是真实浏览器中有效的操作。

§ 4

The project is organized into a small Python package (src/browser_harness/) and a user-facing agent workspace. The four core files are: install.md (first-time setup), SKILL.md (daily usage), src/browser_harness/ (the protected core), and the agent workspace located at ${XDG_CONFIG_HOME:-~/.config}/browser-harness/agent-workspace/. The workspace contains agent_helpers.py (helper code the agent edits) and domain-skills/ (reusable browser skills). The browser-harness CLI command attaches to the running Chrome/Chromium CDP endpoint. For isolated automation, you can launch Chrome yourself with --remote-debugging-port and pass BU_CDP_URL, or use a Browser Use cloud browser. The entire project is around 1,000 lines of code across those files.

项目结构十分精简：一个 Python 包（src/browser_harness/）加一个面向代理的工作区。四个核心文件是：install.md（首次安装）、SKILL.md（日常使用指南）、src/browser_harness/（受保护的核心代码），以及代理工作区（位于 ${XDG_CONFIG_HOME:-~/.config}/browser-harness/agent-workspace/）。工作区包含 agent_helpers.py（代理可编辑的辅助代码）和 domain-skills/（可复用的浏览器技能）。browser-harness 命令行工具会自动连接到正在运行的 Chrome/Chromium CDP 端点。如果需要隔离自动化，可以自行启动 Chrome 并指定 --remote-debugging-port 参数，然后设置 BU_CDP_URL 环境变量，或者使用 Browser Use 云浏览器。整个项目的代码量大约 1000 行。

§ 5

The recommended way to get started is to paste the following setup prompt into your coding agent (e.g., Claude Code or Codex):

Install or upgrade browser-harness to the latest stable version with uv using Python 3.12, register the skill from `browser-harness skill`, and connect it to my browser. Follow https://github.com/browser-use/browser-harness/blob/main/install.md if setup or connection fails.

The agent will then open chrome://inspect/#remote-debugging and walk you through enabling remote debugging. You need to tick the checkbox and allow the per-attach popup (Chrome 144+). Once connected, the harness is ready. You can test it with a simple one-liner:

./browser-harness <<'PY'
print(page_info())
PY

If you want to use Browser Use Cloud browsers (free tier available), grab an API key from cloud.browser-use.com/new-api-key or let the agent sign up itself via docs.browser-use.com/llms.txt.

推荐的入门方式是将以下设置提示词直接粘贴到你的编码代理（如 Claude Code 或 Codex）中：

Install or upgrade browser-harness to the latest stable version with uv using Python 3.12, register the skill from `browser-harness skill`, and connect it to my browser. Follow https://github.com/browser-use/browser-harness/blob/main/install.md if setup or connection fails.

代理会自动打开 chrome://inspect/#remote-debugging 并引导你启用远程调试。你需要勾选复选框，并在 Chrome 144+ 的弹出窗口中点击“允许”。连接成功后，工具即可使用。你可以用一行命令测试：

./browser-harness <<'PY'
print(page_info())
PY

如果你想使用 Browser Use 云浏览器（提供免费套餐），可以从 cloud.browser-use.com/new-api-key 获取 API 密钥，或者让代理通过 docs.browser-use.com/llms.txt 自行注册。

§ 6

Browser Harness shines in scenarios where you need an LLM to perform complex, multi-step browser tasks that are not easily scripted—like scraping data behind login walls, filling out adaptive forms, or automating repetitive workflows on sites that change frequently. It also supports persistent browser sessions, cross-origin iframes, drag-and-drop, file uploads, shadow DOM, and more (see the interaction-skills/ directory). However, it has some limitations: it currently only works with Chrome/Chromium (including headless mode) and requires the remote debugging feature to be enabled. The agent must be capable of writing Python code; if the LLM is weak at code generation, the self-healing mechanism may not work well. Also, because the harness gives the agent full control of the browser, security considerations apply—never run it on untrusted or sensitive pages without proper isolation. The project is MIT licensed and open to contributions, especially domain skills for common sites (LinkedIn, Amazon, etc.).

Browser Harness 最适合的场景是那些需要 LLM 执行复杂、多步骤的浏览器任务，且这些任务难以用脚本固化——比如爬取登录后的数据、填写自适应表单、或在频繁变动的网站上自动化重复流程。它还支持持久化浏览器会话、跨域 iframe、拖拽操作、文件上传、Shadow DOM 等（详见 interaction-skills/ 目录）。不过，它也有一些限制：目前只支持 Chrome/Chromium（包括无头模式），并且需要开启远程调试功能。代理必须能够编写 Python 代码；如果 LLM 的代码生成能力较弱，自修复机制可能无法正常工作。此外，由于工具赋予代理对浏览器的完全控制权，安全方面需要谨慎——在没有适当隔离的情况下，切勿在不可信或敏感页面上运行。项目采用 MIT 许可证，欢迎社区贡献，特别是针对常见网站（如 LinkedIn、Amazon 等）的领域技能。

Open source ↗