#006 Latest 6/29–7/5 Published Jul 5

The Loop Is the Harness: Models Are Commoditized, the System Is Just Beginning

This week marks a quiet but complete paradigm shift: the center of gravity in agent development has moved from 'writing a good prompt' to 'designing a good loop.' That while-loop is no longer a plumbing detail—it determines whether a system converges, how it brakes, and whether costs spiral out of control. Models are being commoditized fast, and the harness—the system that assembles models, tools, context, feedback, and constraints into a loop—is what separates the mediocre from the exceptional. Our 24 picks, from Boris Cherny's loop engineering manifesto and Claude Code's four official loop patterns, to context caching engineering, Skill design philosophy, and multi-model collaboration at the serving layer, form a construction manual for this new mindset. After this issue, you'll stop asking which model is better—you'll ask: can my loop survive a night, stay within budget, and deliver usable results?

24 picks 6 sections ~6 hr

Section 01

The Loop as System: An Engineering Roadmap from Prompt to Autonomy

7 / 24

x.com · 8 min

Loop Engineering: When Prompting Takes a Back Seat to the System循环工程：当提示不再是主角，Agent 系统的核心转向

This article, inspired by Claude Code creator Boris Cherny, argues that the center of gravity in agent development has shifted from prompt engineering to loop engineering. It unpacks the trivial core loop and identifies four hard problems: knowing when to stop (distinguishing tool-call cessation from task completion), maintaining context hygiene to avoid decay, designing tools that agents can actually use (idempotent writes, error messages for LLMs), and embedding a critic in the loop to prevent self-agreement. The piece underscores that the model is commoditized; the loop—the harness—is where real engineering value lies. A must-read for engineers building autonomous agent systems.

x.com · 18 min

Loop Engineering: A Technical Roadmap for an Autonomous Loop循环工程：构建不会在睡着时烧掉你预算的自动化循环技术路线图

This is a technical roadmap for building reliable autonomous loops, arguing that a loop is fundamentally different from a prompt—a prompt requires manual initiation while a loop drives itself: set a goal once, then the system finds work, executes, checks, fixes, and repeats until completion. The author emphasizes that the ceiling is set not by prompting skills but by engineering a loop that converges toward truth rather than becoming an expensive random walk. The piece provides step-by-step guidance (Step 0 through Step 7) with working code (Bash scripts), explaining the mechanics of stateless iteration (fresh context per turn to combat context rot), building a narrow relevant context with a token budget, designing incorruptible checks (external deterministic oracle + reward-hacking defense gates + adversarial judge on a different model), dual-level state persistence (human-readable STATUS.md + machine-parseable JSON), physical isolation (git worktree, container with --network none), brakes with observability (structured JSONL log, circuit breakers for stuck/repeated failures, liveness heartbeats), and nonlinear cost analysis (why stateless keeps per-iteration cost constant while stateful grows quadratically). This is aimed at production engineers building AI agent pipelines who need practical, verifiable techniques.

x.com · 7 min

The 5 Levels of Loop Design: From Prompting to Autonomous Agents从提示到自治：设计 AI 工作循环的五个阶段

The creator of Claude Code says he no longer writes prompts—loops prompt it instead. This post introduces a 5-level progression of human-AI workflow: from Level 1 (single-turn prompting), through Level 2 (manual loop of do-check-correct), Level 3 (verified loop with separate judges for 'done'), Level 4 (self-running loop using /goal command with guardrails), to Level 5 (autonomous systems where loops self-start, run in parallel, and persist lessons into a skill base). Each level comes with a tell and a concrete next step. For developers who still feel they are 'babysitting' their AI agents.

claude.com · 8 min

Getting started with loopsClaude Code 循环模式：从手动检查到定时任务的工程化指南

This article is an official engineering guide from Claude Code that systematically lays out four agentic loop patterns and their use cases. Turn-based loops are for short exploratory tasks; users encode manual verification steps into SKILL.md — e.g., asking Claude to start a dev server, take screenshots, and check the browser console. Goal-based loops, triggered by /goal, define deterministic termination criteria such as 'get the Lighthouse score to 90 or above' and force iteration until the target is met. Time-based loops come in two flavors: /loop for local polling on an interval and /schedule for cloud-triggered routines, ideal for recurring work like PR review or CI fixups. Proactive loops combine /schedule, /goal, dynamic workflows, and auto mode into a pipeline for long-running, well-defined streams of work. The article also covers code quality maintenance and token usage management: encoding conventions, using scripts instead of re-reasoning, routing routine work to cheaper models, and monitoring cost with /usage. Suitable for engineers embedding Claude Code into daily dev workflows.

cursor.com · 13 min

Continually Improving Our Agent HarnessCursor 代理框架的持续改进：从上下文管理到模型定制

Cursor shares how it continuously improves its agent harness, covering context window evolution from static to dynamic fetching, a two-layer evaluation system (offline benchmarks and online A/B tests measuring code keep rate and user satisfaction), tool call error classification and repair pipeline (anomaly detection + automated log analysis with Cloud Agents), per-model customization of tool formats and prompts (e.g., patch vs. string replacement), and mid-chat model switching with specialized instructions. The post concludes with a vision of multi-agent architectures where the harness orchestrates specialized sub-agents.

justinyan.me · 4 min

Superpowers: How to Make an AI Agent Run All Night and Deliver Usable ResultsSuperpowers：让AI Agent跑通宵且交付可用的秘诀

The author shares their journey from a failed attempt at orchestrating long-running AI agent tasks to discovering the Superpowers Skill Set, which solves the core pain points. Superpowers decomposes the development workflow into three phases: brainstorming, writing-plans, and executing-plans (with subagent-driven-development). Key design elements include: using separate prompt templates (implementer, spec-reviewer, code-quality-reviewer) to enforce separation of concerns; spinning up a fresh subagent for each task to avoid cascading context pollution; using hard constraints like "Never/HARD-GATE" to prevent agent deviation; and enforcing software engineering best practices such as TDD, DRY, and YAGNI. The author argues that with frontier models like Opus 4.8 and Codex GPT-5.5 now being sufficiently capable, the real bottleneck is harness design—using clear specifications and structured processes to make even cheaper models reliable for long-duration tasks.

blog.fsck.com · 8 min

Superpowers 6: Cutting Build Cost 60% via Autoresearch LoopSuperpowers 6：用自动化研究循环将构建成本降低60%

Superpowers 6 is released, with its biggest improvements driven by an automated research loop. The author used Anthropic's Fable model (briefly available) to systematically optimize their Subagent Driven Development pipeline. Over 36 hours and ~$165 in token spend, 25 experiments were run, yielding a 50% reduction in wall-clock time and 60% reduction in token consumption vs. v5. Key optimizations: merging spec compliance and code review agents, pre-baking review packets to minimize git operations, and dynamic agent allocation based on task type (e.g., using cheap haiku for non-code plans). The post also documents falsified hypotheses (e.g., capping controller thinking backfires) and emphasizes the role of their eval suite in rigorous measurement.

Section 02

Context as Architecture: Fighting Memory Decay and Context Rot

3 / 24

x.com · 21 min

Context Engineering for AI Agents: The Complete PlaybookAI Agent 上下文工程完全手册：为什么你的 Agent 在第 15 步开始变糟

This article systematically explains why context engineering is the most critical skill for building reliable AI agents. It argues that agent degradation usually stems from poor context window management rather than model limitations. The context window is likened to RAM, and as tool outputs, retrieval results, and conversation history accumulate, attention thins and the “Lost in the Middle” effect kicks in. Four core strategies are presented: Write (persist information outside context), Select (just-in-time retrieval), Compress (proactively reduce tokens), and Isolate (separate contexts for different jobs). The article details four failure modes—poisoning, distraction, confusion, and clash—and offers concrete evidence: Chroma benchmarks show continuous performance decline well before token limits, RAG‑MCP improved tool selection accuracy from 14% to 43% while halving token usage, and KV‑cache hit rates can yield a 10× cost reduction. A real-world workflow that shipped ~35,000 lines of Rust code in 7 hours using frequent intentional compaction is presented. The target audience is engineers building production‑grade agents.

x.com · 21 min

Building a Good Vertical Agent: Context as a Cache Hierarchy打造优秀垂直 Agent：用缓存层级组织上下文

The article argues that a good vertical agent is a faithful compression of its task distribution, and its context should be organized as L1/L2/L3 cache tiers. Using their Shortcut spreadsheet agent as example, they detail extreme optimizations: reading a range compresses 500 formulas into a single legend line via R1C1 normalization and aliasing; after writing, a structured diff groups, samples, and triages changes, flagging #REF! errors under MUST FIX. L2 provides curated English specs fetched on demand, like the pivot table recipe that bakes in gotchas (suspendLayout/resumeLayout, raw integer 8 for aggregation). L3 is the raw API reference plus a 100-line grep skill that lets the model mine tens of thousands of lines in bounded steps. The prompt budget mirrors the frequency curve, and the hierarchy moves as models improve. Practical, transferable advice for engineers building reliable agents in any domain.

www.aihero.dev · 5 min

How To Make Codebases AI Agents Love如何让代码库成为AI代理的“理想家园”——深模块设计实践

This article argues that codebase structure matters more than prompts or AGENTS.md files for AI agent output quality. The core idea is applying 'deep modules' from A Philosophy of Software Design: each module exposes a simple interface controlling lots of implementation. The author introduces 'grey box modules'—developers own and test the interface, AI owns the implementation inside. This improves feedback loops (tests are feedback), navigability (filesystem mirrors mental model), and reduces cognitive load (developers only track 7-8 module boundaries). The article notes TypeScript's difficulty enforcing boundaries and recommends the Effect library. For engineers optimizing AI coding workflows.

Section 03

Skills as Product: Engineering Reusable Units of Expert Experience

4 / 24

www.aihero.dev · 8 min

5 Agent Skills I Use Every Day to Encode My Development Process每日必用的5个Agent技能：编码你的开发流程

Matt Pocock, a seasoned engineer, shares 5 agent skills he uses daily to encode rigorous, repeatable processes for LLM agents, addressing their lack of memory and tendency to drift. Key skills include: grill-me (exhaustive questioning before coding), to-prd (turning discussions into PRDs), to-issues (slicing PRDs into vertical issues), tdd (forcing red-green-refactor cycles for quality), and improve-codebase-architecture (identifying shallow modules for deepening). The core insight: short, well-crafted skills can dramatically boost agent output quality.

x.com · 13 min

What I Learned About Agent Skills from Building Popular Ones做了些爆款 Skills 后，我对 Skills 的看法

The author, having built several popular Skills (PPT, social media cards, logo generator, AI desk card), argues that Agents amplify rather than erase capability gaps. A Skill is defined as a reusable capability unit that bundles expert experience, workflows, taste, and tool calls. Core insights: Skill design is externalizing human taste as constraints (e.g., no pure white/black, text must not cover faces); architecture should be 'short center, thick radius' with SKILL.md holding only high-signal flow; quality must be maintained like code, with gotchas from real failures being the most valuable; the ecosystem should present each Skill as a feature page, not a repository list; distribution relies on GitHub for cross-platform reach and content platforms for community building, creating a flywheel of articles, products, and use cases. A full lifecycle from real need to feedback iteration is proposed. The article is aimed at AI Agent developers, product managers, and content creators, offering concrete cases and actionable design principles.

claude.com · 11 min

Steering Claude Code: CLAUDE.md, skills, hooks, rules, subagents and moreClaude Code 配置深度指南：规则、技能、子代理与钩子全解析

This official guide from Claude Code maps out seven mechanisms for injecting instructions: CLAUDE.md, rules (with optional path scoping to save tokens), skills (dynamically loaded on invocation), subagents (fully isolated context, ideal for side tasks), hooks (deterministic triggers with low context cost), output styles (highest instruction weight, but replace defaults), and append-system-prompt (additive but has diminishing returns). It details when each loads, its context cost, and typical use cases. Key advice: use hooks for deterministic behavior over CLAUDE.md, skills for multi-step procedures, path-scoped rules for API-specific constraints, and managed settings for non-overridable guardrails. Aimed at engineers customizing Claude Code for production workflows.

justinyan.me · 3 min

Switching from Superpowers to mattpocock/skills: Less Token Waste, More Control从Superpowers转向mattpocock/skills：更省token、更可控的Agent工程实践

The author shares a real-world comparison between Superpowers and mattpocock/skills, explaining why they switched. Superpowers uses hooks to enforce a rigid workflow, which is helpful for novices but often overcomplicates simple tasks and burns excessive tokens. mattpocock/skills takes a 'real engineer' approach, giving control back to the user via explicit commands like /grill-with-docs, /to-prd, /to-issues, and /implement. Key advantages: lower token consumption, built-in debugging (/tdd, /diagnosing-bugs), model handoff (/handoff), and architecture refactoring (/improve-codebase-architecture). The author pairs these skills with Fable 5 and Codex 5.5 models, storing PRDs and issues on GitHub for traceability. A candid take for engineers evaluating agent frameworks and tooling.

Section 04

Hands for the Agent: Practical Browser Control and Design Automation

3 / 24

github.com · 64 min

Browser Automation CLI for AI AgentsAI 智能体浏览器自动化 CLI

agent-browser is a native Rust CLI designed for AI agents to automate browser interactions. It uses a client-daemon architecture where the Rust daemon directly communicates with Chrome via CDP, eliminating the Node.js dependency. The tool offers a comprehensive command set covering navigation, element interaction (via ref/CSS/XPath/text selectors), snapshots, screenshots, network interception, session management, and authentication state persistence. It includes built-in safety features like domain allowlists, action policies, and encrypted state storage. It is optimized for AI workflows with accessibility tree snapshots, annotated screenshots, and MCP server support, making it ideal for engineers building AI agents, automated testing, web scraping, or enabling LLMs to control browsers reliably.

github.com · 7 min

Self-Healing Browser Harness That Lets LLMs Drive Any Real Browser让 LLM 直接操控真实浏览器的自适应 CDP 工具

Browser Harness is a thin, self-healing CDP harness that connects an LLM directly to a real Chrome browser via a single WebSocket, with zero intermediate layers. When the agent needs to perform an action it hasn't seen before (e.g., file upload, cross-origin iframe interaction, drag and drop), it writes the missing helper code on the fly and saves it into an agent-workspace for reuse. The core package is roughly 1K lines, enabling complete freedom for browser automation tasks. Aimed at developers who need AI agents to perform real, unconstrained browser interactions.

github.com · 35 min

Local-first, agentic design workspace with 22 CLI agents and 150+ brand systems开源本地优先的设计工作台，兼容 22 种编程代理与 150+ 设计系统

Open Design is a local-first, open-source alternative to Claude Design. It is agent-native, meaning it doesn't ship its own agent but works with 22 coding-agent CLIs (Claude Code, Codex, Cursor, Copilot, etc.) already on your PATH. Using MCP, the agents read DESIGN.md brand systems, skills, and plugins to generate prototypes, live dashboards, decks, images, videos, and HyperFrames. Exports to HTML, PDF, PPTX, MP4. Supports BYOK for any OpenAI-compatible endpoint. Ships 100+ skills, 150+ brand-grade design systems, and 261 plugins. Ideal for engineers and designers who want brand fidelity and local control.

Section 05

Multi-Model Collaboration: Surpassing Single Frontier Models at the Serving Layer

4 / 24

vllm.ai · 14 min

Micro-Agent: Beat Frontier Models with Collaboration inside Model API微代理：在模型API层内协作，超越前沿模型

The vLLM Semantic Router proposes a different take: a router is not just a request dispatcher but an amplifier of model capability. The core idea is to encapsulate multi-model collaboration inside a single model API call. The user sees one model endpoint (vllm-sr/auto), but behind it the router can automatically select a collaboration pattern — from cost-aware escalation (Confidence), parallel aggregation (Ratings), repeated mixture-of-model reasoning (ReMoM), disagreement-as-signal (Fusion), to budgeted micro-agent workflows (Workflows). These patterns are controlled, configurable, observable runtimes, not application glue. Benchmarks on LiveCodeBench, GPQA-Diamond, and Humanity's Last Exam show the closed-model collaboration scheme (VSR Closed) achieving 92.6%, 96.0%, and 50.0% respectively, matching or beating single frontier models like Fugu Ultra and GPT-5.5. This article is valuable because it sinks multi-model collaboration from the product or application layer down to the serving infrastructure layer, while preserving a single model identity. For engineers building inference routing, multi-model strategies, or cost optimization.

x.com · 9 min

The Claude Opus 4.8 Setup Guide: How to Get Maximum Quality for Minimum Cost (Exact Config Inside)Claude Opus 4.8 配置指南：投入产出比的精确控制

A hands-on configuration guide published day after Claude Opus 4.8's release. The core value lies not in benchmark improvements (SWE-bench 87.6% → 88.6%) but in three operational features: Effort Control for per-task reasoning depth, Fast Mode at 3x cheaper than before, and Dynamic Workflows supporting up to 1,000 parallel subagents. The author provides a cost-optimization matrix routing tasks to Haiku/Sonnet/Opus at different effort levels, claiming ~50% monthly savings ($400-600 down to ~$205) for heavy users. Includes copy-paste configs for environment variables and settings.json. Practical for Claude Code users focused on cost control, though the savings claims are unverified estimates.

magazine.sebastianraschka.com · 45 min

Practical Guide to Setting Up a Local Coding Agent Stack with Open-Weight Models手把手搭建本地编程智能体：Qwen3.6、Codex与Claude Code的实践指南

This is a step-by-step tutorial for building a fully local coding agent using open-weight LLMs (primarily Qwen3.6 35B-A3B) served via Ollama and the Qwen-Code harness. The author covers model selection, speed/memory benchmarking with a custom script, a small agent capability evaluation (5 tasks), and a security audit checklist before running any harness. It then compares the same local model across three harnesses—Qwen-Code, Codex (open-source), and Claude Code—finding that Codex achieves the same task success rate with roughly half the token usage of Claude Code. The guide also explains SSH tunneling to run the model on a dedicated machine (e.g., DGX Spark) while using the harness on the main workstation. Targeted at engineers comfortable with the CLI who want a transparent, inspectable, and free alternative to proprietary coding agents.

claude.com · 16 min

Building effective human-agent teams人类与AI智能体组队协作的四个关键原则

Anthropic shares four lessons from months of internal testing on building human-agent teams. The shift is from a single-player experience (one human, one AI) to a multiplayer model where agents hold their own credentials, persistent memory, and broad access, joining team channels as full members. The key insights: work in public so agents have context, define clear roles and tool access for every member, set an ambitious north star to make agents proactive, and build trust by granting autonomy gradually. Includes practical examples like agent-led bug backlogs and doer-verifier patterns. A must-read for teams embedding AI agents into collaborative workflows.

Section 06

Meta-Thinking: The Superlinear Path of Research, Choice, and Growth

3 / 24

x.com · 10 min

how to be good at research研究者的可训练技能栈：从挑选问题到刻意犯错

A thread by @itsreallyvivek arguing that research skill is a stack of trainable sub-skills, not a gift. Core moves: pick problems you genuinely want to exist (Schulman), upgrade inputs by reading old papers and skipping summaries, write everything down to expose hidden gaps (Graham, Feynman, Darwin), tighten the experimental loop with scripted tooling (Karpathy), stare directly at failure cases instead of loss curves (Andrew Ng), deliberately wander across subfields to find your unfair advantage, and cultivate collaborators who will tell you an idea is bad. The post synthesizes concrete tactics from Hamming, Sutton, Shannon, and others, emphasizing falsifiable forecasts, reproducible tooling, and reading raw data over third-hand threads. Actionable for research engineers and PhD students tired of surface imitation.

www.paulgraham.com · 25 min

Superlinear Returns超线性回报：理解世界的真正驱动力

Paul Graham explains why 'you get out what you put in' is rarely true: in business, science, art, and many other fields, performance yields superlinear returns. He traces this to two fundamental causes: exponential growth (learning compounds, startups scale) and thresholds (winner-take-all in sports, science, fame). As technology advances and institutions weaken, more people can now pursue superlinear returns independently. The essay provides practical heuristics: work on what genuinely interests you, keep learning, take calculated risks, and don't conflate a job with your real work. Graham argues that curiosity, not just ambition, is the most powerful way to find those rare opportunities where the reward curve steepens dramatically. A must-read for any ambitious engineer, founder, or creator.

www.paulgraham.com · 1 min

What Languages Fix编程语言都在解决什么问题？

Paul Graham shares a witty perspective on programming languages by describing each in terms of the problem it was designed to fix. From Algol's reaction to low-level assembly to Python's distaste for Perl's complexity, the list captures the evolutionary logic of language design. This concise, humorous comparison offers engineers a fresh way to think about language selection and the historical context of their tools.

The Loop Is the Harness: Models Are Commoditized, the System Is Just Beginning

The Loop as System: An Engineering Roadmap from Prompt to Autonomy

Context as Architecture: Fighting Memory Decay and Context Rot

Skills as Product: Engineering Reusable Units of Expert Experience

Hands for the Agent: Practical Browser Control and Design Automation

Multi-Model Collaboration: Surpassing Single Frontier Models at the Serving Layer

Meta-Thinking: The Superlinear Path of Research, Choice, and Growth

Mondays · one email