Glean 拾遗
Daily /2026-06-15 / Building cloud agent infrastructure: what's different, and what we learned

Building cloud agent infrastructure: what's different, and what we learned

Source x.com Glean’d 2026-06-15 06:01 Read 10 min
AI summary

A hands-on report from CREAO detailing the architectural challenges of moving AI agents from a single-user desktop to a multi-tenant cloud sandbox. It presents two hard-won lessons. First, decouple slowly-changing user environments from fast-changing platform code by freezing user sandboxes into snapshots and hot-swapping the runner library in ~300ms via an atomic sequence involving chattr, V8 compile cache purging, and post-run re-snapshotting. Second, enforce strict credential isolation by ensuring no long-lived secrets ever enter the sandbox; a host-side API bridge verifies sandbox calls using a dual check of IP allowlisting and short-lived, per-run JWTs, so a compromised agent yields only an expiring, network-pinned token. Concrete commands, validation steps, and design rationale included. Recommended for backend and infrastructure engineers productizing agents in shared environments.

Original · 10 min
x.com ↗
§ 1

Most agent frameworks today assume a desktop. One user, one machine, one process. The agent runs while the laptop is open, writes to a local filesystem, holds API keys in environment variables, and dies when the terminal closes. When something breaks, the user retries. When the agent needs a package, pip install drops it into the user's Python. State, secrets, and lifecycle all sit inside one trusted boundary.

今天的多数 agent 框架仍以桌面为默认场景:一个用户、一台机器、一个进程。agent 在笔记本开盖时运行,写入本地文件系统,API 密钥放在环境变量里,终端一关它便终止。出错了用户重试;缺包了 pip install 装进用户的 Python 环境。状态、密钥和生命周期全都窝在同一个可信边界内。

§ 2

Cloud agent infrastructure has none of those luxuries.

The agent runs on a sandbox that boots fresh, on hardware shared with strangers, triggered by callers the user never meets: a schedule, an HTTP request, another agent. The user is usually asleep when the run happens. The code inside the sandbox may be adversarial. The filesystem has to survive deployments. Credentials cannot live where the agent lives. Every guarantee the desktop gives you for free — persistence, identity, network trust, retry — has to be rebuilt as an explicit system.

云上的 agent 基础设施没有这些奢侈。

agent 运行在每次全新启动的沙箱里,跑在与陌生人共享的硬件上,调起它的调用方用户从未谋面——可能是一个定时任务、一次 HTTP 请求、另一个 agent。执行时用户多半在睡觉。沙箱里的代码可能是恶意的。文件系统要撑得过部署。密钥绝不能与 agent 放在一起。桌面环境免费给的每一项保障——持久性、身份、网络信任、重试——都需要作为显式系统重新构建。

§ 3

We spent the last few months tightening that layer at CREAO. Two lessons came out of it. If you have ever shipped a desktop agent and wondered what changes when it moves to the cloud, this is what changes.

过去几个月,我们在 CREAO 花了很大力气加固这一层。两个教训由此得出。如果你曾交付过桌面 agent 并好奇它搬到云上会发生什么变化——这就是变化所在。

§ 4

Lesson 1: Separate what changes slowly from what changes fast

On a desktop, the user's environment and the agent's runtime are the same thing, updated on the same cadence, by the same person. In the cloud, they are not.

An agent app accumulates state on the platform's side. A stock analyst installs matplotlib, downloads market data, writes charting scripts. That environment is the agent's muscle memory. We freeze it into a sandbox snapshot the moment the user is happy with it, and we hold that snapshot frozen until the user edits the environment again. Every run boots from the same image. Same packages, same files, same versions. Monday's run behaves like Friday's, because nothing underneath has moved.

This is the property that desktop frameworks cannot give you for free. A pip install six months ago resolves to different versions today. A cloud snapshot resolves to the same bytes forever. Reproducibility is something the platform owes the user, and a frozen snapshot is the cheapest way to deliver it.

第一课:把变化慢的东西和变化快的东西分开

在桌面上,用户环境与 agent 运行时是一回事,由同一个人以同样的节奏更新。在云端,二者截然不同。

一个 agent 应用会在平台侧积累状态。一位股票分析师安装了 matplotlib、下载市场数据、编写了图表脚本——这套环境就是 agent 的

Open source ↗