Daily /2026-06-17 / A Local-First Context Compression Layer for AI Agents: Library, Proxy, and MCP in One Stack

A Local-First Context Compression Layer for AI Agents: Library, Proxy, and MCP in One Stack

Source github.com Glean’d 2026-06-17 06:01 Read 18 min

AI summary

Headroom is a local-first context compression layer built specifically for AI coding agents. It slashes token consumption by 60-95% by compressing tool outputs, logs, files, and RAG results before they reach the LLM, all while maintaining answer accuracy. Usable as a Python/TypeScript library, a transparent proxy, a CLI wrapper for popular agents, or an MCP server, it fits into existing workflows without friction. Internally, it combines JSON structure-aware compression, AST-based code minification, and a custom fine-tuned model, grounded by a novel CCR reversible compression system that guarantees original data is never lost. This tool is ideal for engineers who rely heavily on coding agents and want to cut API costs without altering their current toolchain.

Original · 18 min

github.com ↗

§ 1

Headroom is a local-first tool that compresses everything your AI agent reads—tool outputs, logs, files, RAG results, and conversation history—before it reaches the LLM. It reduces token usage by 60–95% while keeping answers accurate, giving you dramatic cost savings with zero code changes in many setups.

Headroom 是一个本地优先的工具，它在 AI Agent 读取的内容（工具输出、日志、文件、RAG 结果、对话历史）送达大模型之前进行压缩。它能减少 60–95% 的 token 消耗，同时保持回答准确性，在许多场景下无需改动代码即可显著降低成本。

§ 2

Modern AI coding agents process vast context—long shell outputs, large JSON blobs, and piles of source files. This quickly exhausts context windows and runs up token costs. Headroom acts as an intelligent filter that strips noise, keeps semantically rich signals, and even allows the LLM to fetch back original details on demand via reversible compression.

现代 AI 编程 Agent 要处理海量上下文——长 shell 输出、大块 JSON 和大量源文件，这会迅速耗尽上下文窗口并推高 token 成本。Headroom 充当智能过滤器，滤除噪声，保留语义丰富的信号，甚至允许大模型通过可逆压缩按需取回原始细节。

§ 3

You can use Headroom in three ways: as a Python/TypeScript library, as a lightweight local proxy that intercepts requests for any language, or as an MCP server for the Model Context Protocol ecosystem. It also offers cross-agent shared memory, letting Claude Code, Codex, and others auto-deduplicate and reuse context across sessions.

Headroom 提供三类集成方式：作为 Python/TypeScript 库在代码中调用；启动一个轻量代理（proxy）拦截请求，适用于任意语言；以及作为 MCP 服务器为 MCP 生态提供工具。它还提供跨 Agent 的共享记忆存储，让 Claude Code、Codex 等多个 Agent 自动去重并复用上下文。

§ 4

Content enters the pipeline, where ContentRouter detects body type (JSON, code, prose) and dispatches to the appropriate compressor. SmartCrusher collapses repetitive JSON, CodeCompressor uses AST-aware pruning, and the on-device Kompress-base model handles natural language. CacheAligner stabilizes prefixes for higher KV-cache hit rates, and the CCR module stores originals locally—the LLM can call headroom_retrieve if it needs them.

内容进入管道后，ContentRouter 检测类型（JSON、代码、纯文本）并分派给合适的压缩器。SmartCrusher 折叠重复 JSON，CodeCompressor 基于 AST 剪枝冗余代码，本地 Kompress-base 模型负责自然语言。CacheAligner 稳定前缀以提高 KV 缓存命中率，CCR 模块则将原始内容保存在本地——大模型可按需调用 headroom_retrieve 取回。

§ 5

Install with pip install "headroom-ai[all]" or npm install headroom-ai. Then pick your mode: headroom wrap claude to wrap a coding agent, headroom proxy --port 8787 to spin up a proxy, or use from headroom import compress in your code. Run headroom stats to see token savings immediately.

用 pip install "headroom-ai[all]" 或 npm install headroom-ai 安装。然后选择模式：headroom wrap claude 包裹一个编程 Agent，headroom proxy --port 8787 启动代理，或在代码中 from headroom import compress。运行 headroom stats 即可查看 token 节省情况。

§ 6

On real agent workloads, Headroom achieves 92% savings on SRE debugging and code search, 73% on issue triage, and 47% on codebase exploration. Accuracy on benchmarks like GSM8K and TruthfulQA remains unchanged or slightly improves. These numbers are reproducible via python -m headroom.evals suite --tier 1.

在真实 Agent 工作负载上，Headroom 在 SRE 排障和代码搜索场景中达到 92% 的节省，issue 分诊 73%，代码库探索 47%。在 GSM8K 和 TruthfulQA 等基准测试中准确率保持不变或略有提升。这些数字可通过 python -m headroom.evals suite --tier 1 复现。

§ 7

Headroom is a great fit if you run coding agents daily, work across multiple agents and need shared memory, or require reversible compression. It's less useful if you only use one provider's native history compaction and don't need cross-agent features, or if your environment can't run local processes.

如果你每天使用编程 Agent、跨多个 Agent 工作且需要共享记忆，或者必须保留可逆压缩，Headroom 非常合适。如果你只用单一厂商的原生历史压缩且不需要跨 Agent 功能，或者运行环境无法启动本地进程，则可以跳过。

§ 8

Unlike hosted services, Headroom runs locally on your machine. Unlike CLI-only wrappers, it handles all context types—files, RAG chunks, logs, and more. It also integrates RTK and lean-ctx for shell-output rewriting and supports the MCP protocol natively, making it a comprehensive layer rather than a point solution.

与托管服务不同，Headroom 在你的本地机器上运行。与仅针对 CLI 的包装器不同，它处理所有上下文类型——文件、RAG 块、日志等。它还集成了 RTK 和 lean-ctx 用于 shell 输出重写，并原生支持 MCP 协议，是一个综合层而非单点方案。

Open source ↗