Glean 拾遗
← All issues
#002 Latest 6/1–6/7 Published Jun 7

When Agents Keep Their Own Schedules: From Babysitting to Set-and-Forget

AI coding crossed a critical threshold this week. The ‘agent’ in our headlines gained a tense—it learned to sustain itself, orchestrate its own resources, and deliver finished work. From Anthropic’s dynamic workflows that let Claude write its own harnesses, to Cursor’s revelation that 35% of their merged PRs now come from cloud agents, the paradigm is shifting from single-prompt transactions to multi-hour autonomous deliveries. The developer’s role is being recast as a dispatcher and reviewer. This new power demands new disciplines: you’re no longer just writing code, but designing contexts, curating memory, and understanding token economics as a first-class system constraint. This issue of Glean traces the emerging blueprint for managing autonomous agents—covering the orchestration patterns, memory architectures, and counterintuitive engineering lessons that separate functional systems from token-burning chaos.

30 picks 6 sections ~6 hr
Section 01

Orchestration as Code: When AI Writes Its Own Script

6 / 30
x.com · 9 min
01

The Complete Guide to /goal, /loop, /schedule & Stop Hooks in Claude CodeClaude Code 自主运行完全指南:/goal、/loop、/schedule 与 Stop Hook

A complete guide to four autonomous Claude Code commands that eliminate per‑step babysitting. /goal sets a success condition checked by a fast evaluator after each turn; /loop runs on a fixed time interval; /schedule creates background tasks independent of an open session; Stop Hooks let you programmatically decide when Claude may stop (e.g., via test‑suite scripts). The article provides templates, real examples, condition‑writing rules, and the /goal + Auto mode combination for fully unattended work. It contrasts use‑cases and offers a decision matrix for choosing the right command, turning Claude from a prompt‑driven assistant into an autonomous coding agent.

x.com · 5 min
02

Claude Code /goal: Autonomous Task Completion Without BabysittingClaude Code 的 /goal 命令:告别手动“继续”,让 AI 自主完成编码任务

The /goal command in Claude Code enables autonomous task completion by letting it loop turn after turn until a verifiable condition is met. An evaluator model (Claude Haiku by default) checks the transcript each turn. The post explains how to write effective goals (specific, measurable, output-verifiable), project setup tips (CLAUDE.md, hooks, Auto Mode), and common pitfalls like vague goals causing token waste and evaluator hallucination. It compares /goal with /loop and stop hooks. For developers tired of nudging AI, this is a practical guide to hands-off coding sessions.

x.com · 15 min
03

A harness for every task: dynamic workflows in Claude CodeClaude Code 动态工作流:让 AI 自动编写任务专用的编排脚本

Anthropic engineer Thariq Shihipar details dynamic workflows in Claude Code, where Claude auto-generates custom JavaScript harnesses to orchestrate multiple subagents. It explains how this overcomes single-context-window failures like agentic laziness, self-preferential bias, and goal drift. Common patterns such as classify-and-act, fan-out-and-synthesize, adversarial verification, and tournament are illustrated with concrete use cases from migrations and deep research to root-cause analysis. The post candidly advises that workflows are token-heavy and not needed for routine coding, offering practical tips on token budgets, saving workflows as skills, and pairing with /goal and /loop.

claude.com · 19 min
04

Multi-Agent Coordination Patterns: Five Approaches and When to Use Them多智能体协调五模式:选型指南与权衡

This post systematically covers five multi-agent coordination patterns: generator-verifier, orchestrator-subagent, agent teams, message bus, and shared state. For each, it explains the mechanism, where it works well, and known struggles (e.g., verifier quality depends on explicit criteria, orchestrator becomes information bottleneck, agent teams require independent subtasks, message bus tracing is hard, shared state risks reactive loops). It recommends starting with the simplest pattern and evolving based on where it struggles, with decision guides comparing patterns side by side. Suitable for engineering teams building multi-agent systems.

x.com · 11 min
05

Claude Subagents vs. Agent Teams, explainedClaude 子代理 vs 智能体团队:多智能体架构该如何选

Compares two Claude multi-agent paradigms: sub-agents (fire-and-forget, isolated context, return compressed results) for embarrassingly parallel tasks, and agent teams (persistent, direct peer communication, shared task list) for ongoing coordination. Provides design principles: decompose by context boundaries, not by roles; start simple and add complexity only when measurable. Covers five orchestration patterns, three situations where multi-agent systems are justified, and common failure modes. Practical advice with code examples for engineers building LLM‑powered agents.

jacobharr.is · 26 min
06

Why I Don’t Vibe Code我为什么拒绝‘氛围编程’

The author refuses to vibe code because he is cheap, experienced, loves messy details, sees friction as a gift, and cares about quality and accountability. Drawing on Fred Brooks' No Silver Bullet, he argues that LLMs reduce only accidental complexity, not essential complexity. He highlights the danger of data abstraction without skepticism using DOGE’s misinterpretation of Social Security records. He insists that the joy and friction of programming are essential to good design. The essay also covers ethical concerns and the human cost of AI-driven development. A must-read for developers questioning the AI coding trend.

Section 02

Agent Anatomy: Dissecting the 12 Parts of a Production-Grade Harness

5 / 30
x.com · 19 min
07

The Anatomy of an Agent HarnessAgent Harness 解剖:构建生产级 Agent 的 12 个组件

A deep dive into the 12 components of a production-grade agent harness, synthesizing practices from Anthropic, OpenAI, LangChain, and others. It argues that the harness—not the model—determines real-world agent performance, citing evidence like LangChain's 20+ rank jump on TerminalBench and Claude Code's 95% context reduction. Essential reading for engineers building or debugging AI agents.

x.com · 20 min
08

How to build your own agent harness???用可替换 Worker 构建你自己的 Agent 控制框架——iii 架构详解

iii proposes an alternative to monolithic agent frameworks: decomposing the harness into a set of independent workers on a shared WebSocket engine. Each of the 15 responsibilities (turn orchestration, auth, policy, approval, session, etc.) is a separate worker that registers functions and triggers on a common bus via iii.trigger(). The architecture makes every layer replaceable—swap the model catalogue by writing a worker that registers models::list, add a provider with provider::<name>::stream, without touching the rest of the stack. The post walks through the turn loop in detail, explaining how provisioning, streaming, tool execution, approval gating, and teardown work across workers. Concrete replacement examples include model catalogue, credential store, approval UI, and policy engine. The entire harness is open source (github.com/iii-hq/workers), with all workers operating under the same protocol, yielding full OpenTelemetry tracing. The key insight: 'build your own harness' means swapping workers, not forking a framework, enabling a slider between thin and thick setups.

engineering.leanix.net · 7 min
09

Why Your AI Agent Is Drowning in Tools (And How Code Mode Saves It)为什么你的AI代理被工具淹没(以及代码模式如何拯救它)

When an AI agent integrates many MCP tools, it risks context bloat and tool hallucination — 50+ tools can eat 5–7% of the context window. Traditional remedies like agent-side filtering and MCP-side reduction have trade-offs. Code mode lets the LLM search and execute tools via code, slashing token usage, enabling complex control flow, but adding debugging and infrastructure overhead. Cloudflare and Anthropic examples show that the real lesson is to keep a reasonable toolset driven by use cases, not magic numbers.

x.com · 11 min
10

5 Agent Design Patterns for Long-running AI Agents构建生产级长时间运行AI Agent的5种设计模式

Google Cloud presents five design patterns for building AI agents that run up to seven days: checkpoint-and-resume for state durability, delegated human-in-the-loop with zero-cost pausing, layered memory governance (Memory Bank, Profiles, Agent Identity/Registry/Gateway) against drift and leakage, ambient event-driven processing with externalized policies, and fleet orchestration using independently deployed specialists. Each pattern includes ADK code examples and diagrams, addressing production concerns like memory drift. Aimed at developers scaling agents from chatbots to autonomous workers.

x.com · 8 min
11

Lessons from Building Claude Code: Prompt Caching Is Everything构建 Claude Code 的教训:提示缓存就是一切

Anthropic engineer shares hard-won lessons from optimizing prompt caching in Claude Code. Prompt caching relies on strict prefix matching, so the order of static vs dynamic content is critical — static system prompts, tools, and context must come first. The post reveals counterintuitive pitfalls: don't update the system prompt mid-conversation (pass updates via messages instead), never switch models or modify tool sets mid-session (it invalidates the entire cache), and when compacting context, reuse the parent conversation's prefix to avoid paying full price for tokens. Practical patterns include using tools like EnterPlanMode to model state transitions, deferring tool loading, and running alerts on cache hit rate. A must-read for anyone building long-running agentic products.

Section 03

Persistent Memory: Building AI Workspaces That Remember

7 / 30
x.com · 10 min
12

Designing for Agents: Patterns, Feedback, and Context为 Agent 而设计:交互模式翻转与三条实践原则

Ramp’s MCP weekly active users grew 10x in 3 months; Salesforce launched Headless 360, signaling that 80% of software interaction is shifting to agents. The article proposes a new pattern: User → User’s Agent → Software’s Agent → Database, and offers three practical heuristics: proactively teach calling agents how to succeed (like Notion pre-loading a Markdown spec); build feedback loops via required rationale, a feedback tool, and purpose-built seeds; mind the context gap in agent-to-agent interactions by letting each side contribute what it knows best. Essential reading for product teams building agent-native interfaces.

x.com · 8 min
13

Andrej Karpathy says 99% of AI users miss 7 basics. Full breakdown.Andrej Karpathy 亲述:99% 的 AI 用户不知道的 7 个基本功

Andrej Karpathy — OpenAI co-founder, former Tesla AI head — argues the bottleneck for most AI users isn't the model or the prompt, but the lack of a system around it. This breakdown covers his 7 practical rules: provide full context instead of magic prompts; curate a proper CLAUDE.md; adopt a /raw, /wiki, and config three-layer memory; permanently save strong outputs as reference pages; maintain index.md and log.md for long projects; treat AI as a super-intern with no taste, working in small verified steps; and add one line to render research as navigable HTML. Aimed at engineers stuck in prompt tweaking loops, these habits take an afternoon to set up and compound fast.

x.com · 15 min
14

Turn Claude into a Consistent Assistant with CLAUDE.md: 21 Essential Instructions用21条指令写好 CLAUDE.md,让 Claude 记住你的偏好不再从零开始

Every new Claude session starts with zero memory, forcing you to re-explain preferences and correct the same mistakes. CLAUDE.md is a persistent instruction file that Claude automatically reads, providing context, voice, and behavioral rules from the very first message. This guide presents 21 practical instructions grouped into communication style, behavior constraints, personal context, session memory, and developer-specific safeguards. Each instruction includes the rationale and a ready-to-use snippet. By creating a CLAUDE.md file with even a few of these rules, you can dramatically improve output consistency and save hours each week. Ideal for engineers, writers, and anyone who uses Claude professionally.

x.com · 5 min
15

8 proven tips for crafting a CLAUDE.md that truly understands your project让Claude Code更懂你:写好CLAUDE.md的8条实战经验

This article distills 8 practical tips for optimizing CLAUDE.md to make Claude Code better aligned with your project: keep it under 200 lines to avoid information overload; maintain a 'do not introduce' list; define actionable coding rules (e.g., use named exports, ban any type); treat CLAUDE.md as a router to other docs, not a library; localize configs for sensitive modules; enforce key rules via hooks; use a MEMORY.md file for cross-session memory; and predefine work style preferences. These insights come from real-world use, backed by concrete examples and contrast cases, targeting engineers who use AI coding assistants.

x.com · 13 min
16

Building an AI Second Brain with Claude and Obsidian: The Complete Tutorial用 Claude 和 Obsidian 搭建 AI 第二大脑:从零到可用的完整教程

A hands-on tutorial on connecting Claude to an Obsidian vault, turning your notes into a queryable knowledge engine that reasons over your own context. Covers vault structure (PARA method), AI-first note design, three Claude integration methods (Projects upload, Claude Code direct access, MCP servers), and five ready-to-use workflows (weekly digest, research synthesis, idea connection, knowledge gap auditing, daily briefing). Best for developers, researchers, and knowledge workers building a persistent personal knowledge system.

x.com · 17 min
17

How to Stop Hitting Claude Usage Limits: 23 Token-Saving HabitsClaude 额度总爆?23 个省 token 习惯,每月只超限一次

A personal guide of 23 habits to reduce Claude token usage, based on author's experience and Anthropic docs. Includes converting files before upload, planning in Chat before building files, using edit instead of follow-ups, and voice-to-text for richer prompts. Helps paid users go from daily limits to hitting them once a month. For heavy Claude users.

x.com · 16 min
18

Meta-Meta-Prompting: The Secret to Making AI Agents WorkMeta-Meta-Prompting:Garry Tan 如何用 AI 构建真正运转的第二大脑

Garry Tan, CEO of Y Combinator, presents GBrain, his personal AI agent system built on 100,000 pages of structured knowledge and over 100 modular skills. The core architecture follows a “thin harness, fat skills, fat data” philosophy: a lightweight runtime like OpenClaw routes messages to self-contained skill files, which are themselves created and improved by a meta-skill called Skillify. Tan illustrates the compounding value through the “book-mirror” pipeline, which cross-references a book’s ideas with his actual life events, journal entries, and meeting notes. He details the evolution from an error-prone first version to a reliable workflow using multi-model cross-modal evaluation and deep brain retrieval. Other examples include automated meeting preparation that synthesizes months of accumulated context and entity propagation that updates every related person or company page after a conversation. The article provides a concrete architecture overview, evidence of iterative improvement, and a four-step starting guide for developers building personal compounding AI systems.

Section 04

From Lab to Prod: The Brutal Truth of Real-World Deployments

7 / 30
openai.com · 15 min
19

Building self-improving tax agents with Codex用 Codex 构建自改进税务 AI:生产反馈闭环实践

OpenAI and Thrive Holdings co-developed Tax AI for Crete's accounting firms, using a Codex-driven self-improvement loop. The system processed 7,000 returns with 97% accuracy and 50% throughput increase, cutting one senior accountant's prep time from 180 to 15 hours. The design relies on three pillars: practitioner feedback, production traces, and a Codex iteration cycle. A detailed rental property example shows how practitioner corrections become eval targets, then Codex investigates and proposes fixes. Practical for teams building self-improving agents in expert domains.

blog.cloudflare.com · 51 min
20

Orchestrating AI Code Review at ScaleCloudflare 多智能体代码审查实战:7 个专项 Agent 并行,30 天跑完 13 万次 review

Cloudflare built an AI code review system on OpenCode, orchestrating up to 7 domain-specific agents (security, performance, docs, etc.) via a coordinator. Over 30 days it processed 131k+ reviews with a median latency of 3m39s and average cost of $1.19. The post dives deep into plugin architecture, risk tiers, circuit breakers, incremental re-reviews, prompt injection prevention, and honest limitations. Suitable for engineers exploring AI-assisted development and CI/CD integration at scale.

x.com · 12 min
21

Getting the most out of Codex不止写代码:Codex 持久线程、目标验证与自动化全景

This guide shows how to extend Codex from a code assistant to a persistent work system built around durable threads. Readers will learn: using pinned threads with shortcuts (Command-1–9) to preserve context across sessions; voice input for rough ideas; steering and queuing to correct or schedule tasks mid-flight; heartbeat-triggered thread automations (e.g., periodic Slack/Gmail checks); and long-running Goals with test verifiers. The side panel supports inline review of artifacts, while an Obsidian vault serves as shared memory for cross-thread decisions. For engineers integrating AI deeply into their daily workflow.

github.com · 9 min
22

A Multi-Agent IDE to Run Claude Code, Codex, and Others in Parallel Git Worktrees多代理并行 IDE:在一个工作区同时调度 Claude Code、Codex 等 AI 编程代理

Orca is a desktop and mobile IDE designed to run multiple AI coding agents—such as Claude Code, Codex, and Grok—concurrently. It leverages Git's worktree mechanism to give each agent an isolated working directory, eliminating the need for stashing or branch juggling. Users can observe and control all agents from a single interface with tabbed panes, built-in diff review, and direct GitHub Issue/PR integration. It's built for developers who rely on CLI-based coding agents and need to handle multiple features or refactors in parallel.

x.com · 4 min
23

The third era of AI software developmentCursor 踏入 AI 编程第三纪元:云端 Agent 独立作业,内部 35% PR 来自机器

Cursor reflects on three eras of AI-assisted coding: from Tab autocomplete, to synchronous agents, to cloud agents autonomously handling hour-long tasks. Internally, 35% of merged PRs now come from these agents, and agent users have surpassed Tab users. The developer's role shifts to problem definition, setting review criteria, and parallel orchestration. Agents return reviewable artifacts—logs, videos, previews—rather than diffs.

x.com · 12 min
24

Kimi's Agent Swarm: 300 agents, one prompt, real file outputs.300 个智能体,一个提示词,输出真实文件:Kimi 的隐藏利器

Kimi's Agent Swarm is an underused multi-agent orchestration system that turns one prompt into real file outputs—resumes, websites, datasets, reports—by coordinating up to 300 domain-specialized agents. This thread by @0xDepressionn shares concrete examples: 100 tailored CVs, a 100,000-word literature review, and 30 landing pages, each replacing thousands of dollars in professional labor. The author distills 15 actionable rules for harnessing Agent Swarm effectively: write project briefs, not questions; batch tasks for leverage; specify output format upfront; attach source files; and save repeatable workflows as Skills. The result is a shift from single-question chatbots to high-volume deliverable generation, making Kimi a cost-effective alternative to expensive services.

x.com · 7 min
25

The Kimi K2.6 Blueprint: One-Person Agency at $80k/MonthKimi K2.6 代理蓝图:一人团队的 8 万美元月收入公式

This thread presents a blueprint for a one-person AI agency using Kimi K2.6, claiming to replace an entire dev team. It details the model's MoE architecture (1T params, 32B activated), SWE-Bench score of 65.8, and the Agent Swarm that runs 300 sub-agents in parallel. It also covers the tech stack (Kimi API, CLI, Swarm, MCP servers, n8n), service offerings (lead gen, knowledge bases, support automation), pricing, client acquisition via job listing monitoring, and a cost model projecting $500/month overhead and $72k+ monthly profit. The content leans heavily promotional, with unverified revenue figures.

Section 05

Signal & Noise: The Week’s Undercurrents and Snapshots

3 / 30
tw93.fun · 2 min
26

AI Amplifies Output, Not Input: My /learn Workflow for Deep Technical DivesAI 放大的是输出,不是输入:如何用 /learn 流程深入学习一个技术领域

The author shares a personal workflow for deep learning in the AI era: treat learning like coding, structured as collect → filter → outline → draft → AI-assisted tightening → self-review. The core argument is that AI's real value lies in amplifying your output, not in summarizing input. Using a recent deep dive into LLM training as an example, the post introduces the /learn skill in the open-source Waza toolkit to industrialize this process. Recommended for engineers wondering how to maintain depth while leveraging AI.

www.infoq.com · 5 min
27

OpenTelemetry Launches Blueprints Initiative to Simplify Enterprise Observability AdoptionOpenTelemetry 推出 Blueprints 计划:以预设架构与参考实现降低企业可观测性落地门槛

OpenTelemetry has launched the Blueprints initiative to reduce the complexity of large-scale observability adoption. It provides prescriptive architectural patterns, operational best practices, and implementation steps for common scenarios, along with reference implementations from Adobe, Mastodon, Skyscanner, and others. The article explains how accidental complexity—fragmented pipelines, inconsistent semantic conventions, broken context propagation—emerges when organizations adopt OpenTelemetry organically without central standards. Blueprints focus on Kubernetes observability, non-Kubernetes infrastructure, and centralized telemetry platforms, aiming to operationalize telemetry consistently. The initiative reflects a broader shift toward opinionated operational frameworks in cloud-native infrastructure, targeting platform engineering, DevOps, and SRE teams grappling with observability sprawl.

mp.weixin.qq.com · 7 min
28

Weekly AI Roundup: Claude Limits Doubled, SpaceX IPO, Microsoft Model Data Contradiction2026年6月第一周AI快讯:Claude限额翻倍、SpaceX IPO、微软模型数据翻车

A roundup of 10 major AI and tech news items from the first week of June 2026. MiniMax M3 was released, beating GPT-5.5 on coding benchmarks at $0.6/M tokens, though independent verification is pending. DeepSeek raised ~$7.4B in its first external funding round, while Unitree completed its IPO review in a record 73 days. Kimi Work, Coze 3.0, and Qwen3.7-Plus all launched new Agent capabilities. Doubao announced subscription plans. ChatGPT surpassed 1 billion monthly active users. Anthropic doubled Claude Cowork's usage limits, secretly filed for an IPO, and published a report stating Claude writes 80% of its own code. NVIDIA unveiled the ARM-based RTX Spark at Computex. SpaceX is set to IPO on June 12, with Google disclosed paying $920M/month for compute. Microsoft's MAI-Thinking-1 faced backlash after its claimed 'clean data' was revealed to include Common Crawl, and GitHub Copilot's switch to metered billing caused developer bills to spike.

Section 06

More

2 / 30
github.com · 1 min
29

Understand Anything: Turn any codebase into an interactive knowledge graph you can exploreUnderstand Anything:把任何代码库变成可交互的知识图谱

Understand Anything is an open-source tool that turns any codebase into an interactive knowledge graph for exploration, search, and query. Instead of static diagrams, it builds a persistent, navigable knowledge base that integrates with AI coding tools like Claude Code, Cursor, and Codex. It parses code structure and semantic relationships to make logical connections tangible, helping developers quickly onboard legacy systems, locate business logic, or navigate complex codebases.

tw93.fun · 27 min
30

What Really Differentiates LLMs Happens After Pretraining: A Full Post-Training Pipeline Breakdown大模型真正拉开差距的地方在预训练之后:一条后训练链路的完整拆解

A comprehensive deep-dive into the full LLM training pipeline, arguing that the real capability gap in 2026 lies not in pretraining but in the post-training stack: instruction tuning, RL, reward design, Agent training, and distillation. The article breaks down the end-to-end process step-by-step — from data recipes and system architecture constraints, through the four-stage post-training pipeline (Cold Start SFT → GRPO-based Reasoning RL → Rejection Sampling FT → Alignment RL), Grader/Reward evaluation loops, Agent training with PARL and Meta-Harness, to distillation and deployment. Key engineering insights include DeepSeek-R1's public recipe, why GRPO simplifies PPO by removing the value network, PRM vs ORM trade-offs, and the shift from optimizing answers to optimizing harness programs. Targeted at engineers who want to trace concrete capability gains back to specific training stages.