Glean 拾遗
← All issues
#004 Latest 6/15–6/21 Published Jun 21

From Foxconn to Software Factory: The Paradigm Shift in Agent Architecture

This week's picks converge on a single, defining tension: while AI models are growing exponentially smarter, our system architectures often remain stuck in the old era of 'wrapping untrustworthy models in mountains of code.' Garry Tan's series of essays lands like a bomb, articulating a new paradigm of 'thin harness, fat skills' backed by 810x developer output and a self-improving skill system. Meanwhile, Anthropic's 400K-session analysis, CREAO's cloud sandbox lessons, and the engineering practices of Hermes Agent and Factory 2.0 all corroborate the same trend: we've been building isolated 'Foxconn factories' for each agent, but what we need are composable, self-evolving 'software factories' that span platforms. This issue is about the paradigm shift happening right now—from questioning naked models to building systems, from heavy frameworks to lean skills, from monolithic agents to pluggable ecosystems.

21 picks 4 sections ~5 hr
Section 01

The Old Paradigm's Curse: Naked Models and Foxconn Factories

5 / 21
x.com · 18 min
01

Imagine Naked People Were Stupider. Naked Models Are.裸模型正是更蠢:停止测试裸模型,开始构建系统

YC partner Garry Tan responds to Kyle Kingsbury's anti-AI essay, arguing that Kingsbury's tests of naked models are like testing an engine on a bench and concluding cars are unsafe. The article details the 'thin harness, fat skills' architecture: skill files (reusable Markdown procedures) constrain model input, resolvers (routing tables) dispatch tasks, deterministic code handles precision operations, and testing covers the full pipeline. Using Kingsbury's own bathroom rendering and stock data hallucination examples, Tan shows how architecture can turn unreliable models into reliable systems, and shares a personal resolver that reduced file misfilings from 10/13 to zero. The automotive metaphor concludes: seatbelts, traffic lights, and crumple zones made cars safe, not skepticism of engines. Targeted at engineers building or evaluating AI systems.

x.com · 14 min
02

Stop building Foxconn factories for your agents别再为你的 AI Agent 建造富士康工厂了

Garry Tan reflects on his experience building a 540,000-line Rails app, using the Foxconn factory as a metaphor for the dominant AI agent development pattern: wrapping hyper-intelligent models in mountains of code, tests, and guardrails. He argues the economics have inverted—model calls are now cheap and the models are smarter, making the old instinct to ration and control them obsolete. The new paradigm is 'just-in-time software' and 'skill packs,' where lean markdown instructions and minimal TypeScript replace bloated engineering frameworks. A concrete example shows a hackathon judge agent built in an afternoon, doing what previously required a full software project. The essay challenges engineers to abandon the 2013 mental model of measuring capability by lines of code and to embrace 'tokenmaxxing' to gain a 2-3 year competitive advantage. It is aimed at engineers who are coding with AI but still trapped by traditional software metrics and mistrustful architectures.

x.com · 18 min
03

Resolvers: The Routing Table for Intelligence解析器:智能系统的路由表,而非填鸭式上下文堆砌

Garry Tan argues that resolvers—lightweight context routers—are the missing governance layer in agent systems, more crucial than models or skills. Using a mis-filed article as a trigger, he demonstrates how a 200-line resolver replaced 20,000 lines of crammed context, fixing model attention degradation and knowledge base drift. He details a production audit revealing that 10 of 13 skills ignored the resolver, and how he built trigger evals, a "check-resolvable" meta-skill to detect dark capabilities, and a self-healing loop against context rot. The piece reframes resolvers as the organizational chart and management of an agent system, and announces the open-sourcing of his personal architecture (GBrain/GStack) that embodies these patterns. Key evidence: real agent managing 25,000 files and 200 daily inputs, with concrete metrics on skill reachability defects.

x.com · 12 min
04

On the LOC controversy: doing the math on a 810x developer output increaseLOC 争议的数学与数据:一次 810 倍的开发者输出实证

Garry Tan, CEO of Y Combinator, responds to criticism of his claim of shipping 600,000 lines of production code in 60 days. He concedes LOC is a flawed metric but provides a rigorous before-and-after comparison: in 2013, as a part-time coder, he averaged 14 logical lines per day; in 2026, with the same day job, he now averages 11,417. Even after deflation for logical SLOC and an aggressive 2x AI-verbosity factor, the daily rate is 5,708 lines – a 408x increase. Quality data is provided: 2.0% revert rate, 6.3% fix commits, and a test suite that grew from 100 to over 2,000 tests. He details his testing infrastructure (Playwright-based browser CLI, slop-scan), product traction (75k GitHub stars, ~7k WAU), and argues the real shift is the collapse of the “idea to shipping" cycle from weeks to hours. The core argument is that the productivity ground has shifted for all engineers, not just him.

www.anthropic.com · 27 min
05

Agentic coding and persistent returns to expertise从40万Claude Code会话看:领域专长是智能体编程成功的关键

Anthropic analyzed ~400,000 Claude Code sessions, finding that users make most planning decisions while Claude handles execution. Domain expertise, not coding background, is the key to success: expert-rated sessions achieve verified success over twice as often as novices, though intermediate users capture most of the benefit. Non-software occupations succeed at coding tasks within 5 points of software engineers. Over seven months, the share of debugging sessions fell from 33% to 19%, while end-to-end tasks like deployment, data analysis, and document writing grew, and estimated task value rose ~25%. The report details methodology for decision attribution, expertise classification, and success verification, along with limitations. Suitable for engineers and researchers interested in AI coding tools, agent collaboration, and skill transfer.

Section 02

The New Architecture Manifesto: Thin Harness, Fat Skills, Self-Evolving

6 / 21
x.com · 12 min
06

Thin Harness, Fat Skills薄封装,厚技能:用五个概念构建自进化的 AI 代理系统

YC partner Garry Tan argues that the bottleneck in AI agents is not model intelligence but context and process management. He introduces five definitions: Skill files (reusable Markdown procedures), a thin harness (a ~200-line loop for running the model and managing context), resolvers (routing tables that load the right context at the right time), the latent-versus-deterministic boundary (judgment vs. repeatable execution), and diarization (distilling structured intelligence from unstructured data). A real-world example from YC Startup School demonstrates how the same skill file, invoked with different parameters, handles breakout grouping, lunch matching, and real-time pairing, and then improves itself by analyzing mediocre feedback. The piece offers concrete design principles for engineers building agent systems that compound improvements over time.

x.com · 22 min
07

Skillify: turn every agent failure into a permanent structural fix把每次 Agent 犯错变成永久的结构性免疫

Garry Tan presents 'Skillify': a methodology that turns every AI agent failure into a permanent structural fix instead of relying on prompt tweaks or apologies. Using two real failures—an agent bypassing a local script for calendar search and doing mental timezone math—he walks through a 10-step verification checklist: SKILL.md contract, deterministic script, unit tests, integration tests, LLM evals, resolver trigger, resolver eval, reachability audit, smoke test, and brain filing rules. This workflow is built into GBrain, an open-source knowledge engine that ensures agent judgment improves permanently and verifiably. Targeted at developers frustrated by recurring agent mistakes.

x.com · 4 min
08

How to build a self-improvement loop for your Skills为 Agent 技能构建自我改进循环:内外部循环与云代理实战

This article demonstrates a practical approach to building a self-improvement loop for AI Skills using inner and outer agent loops. The inner loop triggers a cloud agent via GitHub Action on each new issue, applying a triage Skill to classify it. The outer loop runs daily, reviews all human corrections (label changes and comments), and generates a diff to update the Skill file, which is then merged back. The author uses Warp's Oz cloud agent platform for issue triage, providing complete code and a sample repo. The pattern is generalizable to code review, bug fixing, and incident response. Suitable for engineers building AI agents who want to improve skill quality over time.

x.com · 20 min
09

Decomposing the agent harness into swappable workers: the iii engine architecture将 agent harness 拆解为可独立替换的 workers:iii 引擎的架构实验

Mike Piccolo argues that monolithic agent frameworks force a tradeoff by bundling the loop, tools, memory, and orchestration into one block, which long-running teams inevitably rewrite. He walks through the iii engine's production worker stack, where all thirteen harness responsibilities—credential resolution, policy checks, turn FSM, session persistence, budget tracking, etc.—are decomposed into 11 independently replaceable workers. Each worker connects to the engine via WebSocket and registers functions and triggers using a single primitive (iii.trigger()), making the harness a composable set of installable workers. The post provides a step-by-step trace of a turn through provisioning, streaming, policy-gated tool dispatch, and reactive approval wake-ups, alongside concrete examples of swapping the model catalog, adding a provider, or integrating a Slack approval surface. The core bet: an agent harness should be a slider of composable workers rather than a framework you fork. This is for backend engineers building or scaling custom agent infrastructure who are hitting the composability limits of existing frameworks.

github.com · 8 min
10

Claude Official Cookbooks: Engineering Recipes from RAG to Multimodal AgentsClaude 官方实践手册:从 RAG 到多模态 Agent 的工程配方集

Anthropic's official collection of practical coding recipes for building with Claude. It provides runnable Jupyter notebooks covering capabilities like classification, summarization, and RAG, alongside advanced techniques such as tool use, multimodal vision, and sub-agent orchestration. The latest additions introduce the Claude Agent SDK and Managed Agents, demonstrating how to build observable, hostable agents—from research assistants to SRE bots—with just a few lines of code.

github.com · 11 min
11

Hermes Agent: A Self-Improving, Multi-Platform AI Agent RuntimeHermes Agent:自我进化的跨平台 AI 智能体运行时

Hermes Agent is a self-improving AI agent framework with a closed learning loop. It creates skills from experience, manages persistent memory across sessions, and operates over Telegram, Discord, Slack, and CLI via a single gateway. Any LLM backend can be used without code changes, and it runs on a $5 VPS or serverless infrastructure with near-zero idle cost. Built‑in cron scheduling, subagent delegation, and batch trajectory generation make it suitable for engineers and researchers who need an autonomous agent that evolves with use.

Section 03

Making Memory and Context Flow: Cross-Agent Infrastructure

6 / 21
x.com · 7 min
12

Stop Giving Every Agent Its Own Skull别再给每个 Agent 单独开颅了

Pejman argues that we are replicating a core human limitation—knowledge siloed in individual brains—inside agent systems. Using OpenClaw, Codex, and Claude Code, each agent retains isolated context about him and his projects. The critical gap is not in the repo's artifacts but in the session itself: the debates, dead ends, and pruned idea branches that markdown cannot capture. With literal physical separation across machines, this fragmentation intensifies. The missing layer is a shared, user-owned memory substrate that transcends agent boundaries. He highlights GBrain and CASS as early signals tackling parts of this problem. The piece resonates with engineers building or deeply integrating multi-agent workflows.

github.com · 14 min
13

Persistent Memory Engine for AI: Auto-Extract, Update, and Forget Intelligently面向 AI 时代的全栈记忆引擎:自动提取、持续更新、智能遗忘

Supermemory is a memory and context layer for AI. It automatically extracts facts from conversations, builds and maintains user profiles, resolves contradictions, and intelligently forgets expired information. Combining hybrid search (RAG + memory), document processing, and live connectors (Google Drive, GitHub, etc.) into one API, it gives AI agents instant, personalized context. With plugins for Claude Code, Cursor, and more, it targets both developers integrating memory into apps and users wanting persistent AI memory across tools.

github.com · 18 min
14

The Context Compression Layer for AI Agents: 60–95% Fewer Tokens, Zero Accuracy LossAI 代理上下文压缩层:60%-95% Token 削减,不丢失关键信息

Headroom is a local-first context compression layer for AI agents that slashes token usage from tool outputs, logs, files, and RAG chunks by 60–95% before they reach the LLM, with preserved accuracy. It offers library, proxy, MCP server, and agent wrapper modes, using a content router to select the best compressor for JSON, code, or prose. Reversible compression ensures originals are retrievable on demand. With cross-agent memory and `headroom learn` for mining failed sessions, it is ideal for engineers running coding agents daily and anyone seeking to slash LLM costs without changing their workflow.

github.com · 18 min
15

A Local-First Context Compression Layer for AI Agents: Library, Proxy, and MCP in One Stack为 AI Agent 打造的本土上下文压缩层:库、代理、MCP 一应俱全

Headroom is a local-first context compression layer built specifically for AI coding agents. It slashes token consumption by 60-95% by compressing tool outputs, logs, files, and RAG results before they reach the LLM, all while maintaining answer accuracy. Usable as a Python/TypeScript library, a transparent proxy, a CLI wrapper for popular agents, or an MCP server, it fits into existing workflows without friction. Internally, it combines JSON structure-aware compression, AST-based code minification, and a custom fine-tuned model, grounded by a novel CCR reversible compression system that guarantees original data is never lost. This tool is ideal for engineers who rely heavily on coding agents and want to cut API costs without altering their current toolchain.

x.com · 10 min
16

Building cloud agent infrastructure: what's different, and what we learned从桌面到云端:为 AI Agent 构建基础设施时我们学到的两课

A hands-on report from CREAO detailing the architectural challenges of moving AI agents from a single-user desktop to a multi-tenant cloud sandbox. It presents two hard-won lessons. First, decouple slowly-changing user environments from fast-changing platform code by freezing user sandboxes into snapshots and hot-swapping the runner library in ~300ms via an atomic sequence involving chattr, V8 compile cache purging, and post-run re-snapshotting. Second, enforce strict credential isolation by ensuring no long-lived secrets ever enter the sandbox; a host-side API bridge verifies sandbox calls using a dual check of IP allowlisting and short-lived, per-run JWTs, so a compromised agent yields only an expiring, network-pinned token. Concrete commands, validation steps, and design rationale included. Recommended for backend and infrastructure engineers productizing agents in shared environments.

claude.com · 32 min
17

Anthropic's Analytics Agent Stack: Tackling Entity Ambiguity, Staleness, and Retrieval FailureAnthropic 用 Claude 打造自助分析智能体的堆栈设计与故障应对

Anthropic’s data team shares how they use Claude to automate 95% of business analytics queries at roughly 95% accuracy. They identify three core failure modes—concept‑entity ambiguity, data staleness, and retrieval failure—and describe a four‑layer agentic stack to address them: data foundations (canonical datasets, rigorous governance), sources of truth (semantic layer, lineage, business knowledge graph), skills (knowledge and procedural skills, which lifted accuracy from ~21% to >95%), and validation (offline evals, adversarial review, online monitoring). The post includes concrete practices such as colocating docs with code, treating metadata as a first‑class product, and an appendix with a skill file skeleton. It is aimed at data engineers and analysts building LLM‑powered self‑service analytics.

Section 04

From Individual to Enterprise: Scaling Practices and Industry Impact

4 / 21
x.com · 5 min
18

Factory 2.0: From coding agents to software factoriesFactory 2.0:从编码代理到自进化的软件工厂

Factory announces its 2.0 release, repositioning from individual AI coding agents to an end-to-end 'software factory'. The post argues that improving individual productivity is no longer enough; enterprises need an interconnected, agent-native system that forms a continuous feedback loop from signals (bug reports, customer feedback) through planning, building, testing, reviewing, securing, shipping, and monitoring. Key design principles include model independence (allowing deliberate model choice or automatic routing per task), sovereign intelligence (data plane and control plane options from cloud to fully air-gapped, with all agent sessions and reviews feeding back into the system), and continual learning and self-improvement across the lifecycle. The article lists customers such as NVIDIA, EY, Adobe, and Palo Alto Networks already running software factories in production. Autonomy is described as a gradual maturation process, using simple Droids, skills, automations, Droid Computers, and multi-agent Missions for different levels of human guidance and agent readiness. The piece is a product launch announcement with some technical concepts, targeting engineers and managers interested in enterprise AI engineering and agent orchestration.

x.com · 5 min
19

A frontier without an ecosystem is not stable前沿模型若无生态系统,便不稳定

Satya Nadella argues that the future of the firm in an AI-driven economy relies on creating a compounding learning loop that integrates human capital and AI 'token capital.' He emphasizes that organizations must build agentic systems that own their institutional knowledge and private RL environments, ensuring they can swap underlying models without losing proprietary expertise. Warning against a future where a few models commoditize all value, he advocates for building a 'frontier ecosystem' that enables broad value distribution across every industry, rather than solely chasing a frontier model. This piece targets executives and senior technologists strategizing AI adoption.

mp.weixin.qq.com · 1 min
20

Kimi Code + K2.7 Code Hands-On: Can It Replace Claude Code?Kimi Code 搭配 K2.7 Code 实战测评:能替代 Claude Code 吗?

A hands-on evaluation of Kimi Code paired with the K2.7 Code model as a potential Claude Code replacement. Tests include using video understanding to replicate an ink-wash animation, using the /goal command to autonomously compress a 2.1MB image to below 120KB, and running a suite of web UI, game, and animation programming challenges. Kimi Code is found to be highly compatible with Claude Code's commands and permission system. The /goal command enables fully unattended task execution. The K2.7 model demonstrates stable code generation capability with a claimed 30% average reduction in reasoning token consumption. A unique built-in Datasource plugin allows querying real-time financial data, company records, and academic papers via natural language within the CLI.

github.com · 28 min
21

A Structured Cybersecurity Skills Library Purpose-Built for AI Agents面向 AI Agent 的结构化网络安全技能知识库

This is not another collection of security scripts or checklists. It’s an AI-native knowledge base that encodes 754 practitioner-grade cybersecurity workflows into a structured, agent-readable format. Each skill carries YAML frontmatter for sub-second discovery and step-by-step Markdown procedures, essentially giving any LLM-based agent the decision-making playbook of a senior analyst. The library spans 26 domains—from DFIR and threat hunting to cloud security and OT/ICS—and maps every skill to MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND, and NIST AI RMF, making it uniquely suited for security professionals integrating AI into real operational workflows.