Glean 拾遗
Recent picks

13picks · chronological

06-12

How an Anthropic seller rebuilt his team's workflows with Claude Code

Jared Sires, a former account executive at Anthropic with no coding experience, used Claude Code to build CLAFTS, a Gmail-integrated tool that drafts customer emails in his voice while pulling context from live product documentation. The tool saves 10-15 hours per week. He expanded this into a sales plugin with skills for daily briefs, recaps, and pipeline management, wired into Salesforce, Gong, and other systems via MCP servers. About 80% of Anthropic's sales org now uses the plugin. The piece illustrates how non-technical practitioners can leverage AI coding tools to eliminate technical barriers and deliver workflow-specific software.

claude.com · 9 min · Agent Architecture · AI Engineering · Claude Code
06-12

25 Claude Features, Workflows, and Tricks That Most Users Don't Know

A practical guide by @eng_khairallah1 detailing 25 workflows and techniques to fully leverage Claude Projects. The core thesis is treating Projects as evolving, persistent workspaces rather than transient chat sessions. It provides actionable strategies including a structured instruction template, strategic file organization, the Living Instructions pattern, and advanced concepts like voice calibration files and competitive intelligence hubs. The guide emphasizes a compounding knowledge strategy where each interaction refines Claude's contextual understanding, suitable for power users aiming to transform Claude from a generic tool into a domain-specific specialist.

x.com · 16 min · Agent Architecture · AI Engineering · Anthropic
06-12

How To Build AI Agents in 2026 (That Actually Work)

This article systematically deconstructs the architecture and engineering practices for building practical AI agents. It clarifies the boundaries between chatbots, AI agents, and agentic AI, emphasizing that a real agent is a system that persistently loops toward a goal rather than delivering a one-shot answer. The author explains the ReAct loop (Reasoning + Acting) and breaks down the five building blocks: the LLM as the brain, tools as hands, short-term and long-term memory, self-correcting loops, and verification. Using a case study of a startup research agent for the fitness niche, the article walks through goal setting, tool integration, loop construction, memory implementation, and the addition of a critic agent, complete with copy-paste system prompts. It highlights six common failure modes and recommends a 2026 tech stack including Claude Code, LangGraph, and MCP. The piece provides a weekend roadmap to build a basic agent from a 50-line Python script and is aimed at developers shifting from prompt engineering to designing agent systems.

x.com · 21 min · Agent Architecture · AI Agents · AI Engineering
06-11

Designing loops with Fable 5: self-correction and cross-session memory

R. Lance Martin demonstrates two loop patterns for Anthropic's Fable 5: self-correction and cross-session memory. On the Parameter Golf challenge (train a model under 16MB and 10 minutes on 8xH100s), Fable 5 with CMA and a verifier sub-agent improved the pipeline roughly 6x more than Opus 4.7, favoring structural changes over scalar tuning. On a continual learning SQL benchmark, Fable 5 progressed through fail-investigate-verify-distill into general rules, reaching 73% verification coverage, while Opus 4.7 and Sonnet 4.6 stalled at sparse notes or uncertain schemas. The key takeaway: design loops and environment feedback so the model can hillclimb, rather than relying on direct prompting.

x.com · 5 min · Agent Architecture · Agents · AI Engineering
06-11

The Missing Link Between Agents and Applications

This article introduces Headless Tools, a mechanism that allows agents to act directly on client-side runtimes such as browsers and desktop applications. The author argues that most current agent tools are server-side, limiting them to API calls while blocking access to browser state, device APIs, and in-app actions. Headless Tools wrap client-side capabilities like geolocation, clipboard, IndexedDB, and application-specific commands as standard tools invocable by the model. The model sees only a tool schema, while the server and client coordinate execution behind the scenes. Code examples in TypeScript demonstrate the pattern, alongside real-world use in a Slidev presentation plugin and browser-local agent memory. Privacy is improved because sensitive data can remain on-device. This is valuable for teams embedding AI agents into rich frontend contexts such as design tools, document editors, and desktop utilities.

x.com · 7 min · AI Agents · AI Engineering · Browser
06-11

Training an LLM to Generate Reliable Structured Output Using GRPO and a Reward Function

A hands-on report on replacing labeled data with a code-defined reward function to train structured output. The author fine-tunes Qwen3-8B for JSON invoice extraction using GRPO. Supervised fine-tuning stalls because its token-level loss only optimizes for surface similarity, not structural validity. The fix: a reward function that scores completions 0.0 (invalid JSON), 0.5 (valid JSON but wrong schema), or 1.0 (fully compliant), providing a learning gradient. Training on Fireworks H200s raised schema-valid output from a baseline of 62% to 82% on held-out prompts, exceeding GPT-4.1's 58%, with lower cost and latency. The approach transfers to any task where correctness is verifiable in code, such as SQL, API calls, or tool use. Full reward function, dataset, and training config are provided.

x.com · 12 min · AI Engineering · Fine-tuning · GRPO
06-10

Designing loops with Fable 5: self-correction and memory in agentic workflows

The author shares two practical directions for improving agentic workflows with Anthropic's Claude Fable 5 model: self-correction loops and cross-session memory. In a Parameter Golf challenge—train the best model within a 16MB artifact in under 10 minutes on 8×H100 GPUs—Fable 5 improved the training pipeline roughly 6× more than Opus 4.7 when using Claude Managed Agents with Outcomes judged by an independent verifier sub-agent against nine checkable criteria. Fable 5 bet on larger structural changes and pushed through a quantization regression, while Opus 4.7 stuck to tuning scalar hyperparameters. For memory, the author used a SQL-based task from Continual Learning Bench 1.0 with filesystem-backed memory across agent sessions. Sonnet 4.6 only logged failures and guesses; Opus 4.7 built flagged schema references but verified only 17% of questions; Fable 5 reached 73% verification coverage in the best run and distilled learnings into general rules. Engineers interested in agent architecture and model capability boundaries will find the experiments relevant.

x.com · 5 min · Agent Architecture · AI Agents · AI Engineering
06-09

Loop Engineering: Designing the System That Prompts Your Coding Agents

Addy Osmani argues that interacting with coding agents is shifting from prompt engineering to 'loop engineering'—designing a system that autonomously discovers tasks, delegates work, and verifies results using five building blocks: scheduled automations, parallel worktrees, project skills, connector plugins, and checker sub-agents. He maps how Claude Code and Codex both implement all five, noting that the leverage point has moved from writing good prompts to architecting persistent loops. The post cautions that loops amplify existing problems: verification, comprehension debt, and cognitive surrender become sharper risks. Intended for senior engineers evaluating how to productize AI coding tools beyond one-shot interactions.

x.com · 14 min · Agent Architecture · AI Agents · AI Engineering
06-09

How to Design a Loop That Prompts Your Agent

This article presents a loop architecture that enables an AI agent to autonomously complete multi-step tasks by building an automated prompting system instead of manually crafting each prompt. It breaks down the loop into five parts: defining a 'done' check, building prompts from dynamic state rather than hand-fed instructions, executing actions while capturing all outputs, feeding failures back as the next prompt, and setting hard stop conditions like max turns and cost limits. A walkthrough of fixing a login bug shows the loop in action, emphasizing that real costs come from repeated turns, making guardrails critical. Encapsulating repeated operations into reusable skills is highlighted as the key to long-term value, and common mistakes—like lacking an exit condition or discarding error output—are pointed out. Suitable for developers shifting from one-shot prompts to designing agent control flows.

x.com · 18 min · Agent Architecture · Agents · AI Engineering
06-08

Composable Agent Skills for Real Engineering Workflows

Matt Pocock's personal agent skills for Claude Code and Codex, targeting four common failure modes in AI-assisted development: misalignment, verbosity, broken code, and design entropy. Instead of controlling the process, these small, composable skills embed engineering fundamentals—grilling sessions for alignment, shared ubiquous language for concision, TDD red-green-refactor loops for code quality, and architecture rescue tools. They work with any model and are designed to be hacked and adapted in your own .claude directory.

github.com · 14 min · Agents · AI Engineering · Claude Code
06-08

Every Agentic Engineering Hack I Know (June 2026)

The author shares 22 practical hacks for agentic engineering with Claude Code and Codex. The core is a plan-first workflow: use /ce-plan to generate a plan.md that guides the agent; humans skim or ask inline instead of reading it. Hacks include: voice input via Monologue or Wispr Flow (LLMs handle imperfect transcription); running 4-6 separate agent sessions in cmux tabs; defaulting terminal tabs to Claude Code and bypassing all permission prompts with sound alerts on completion; giving Claude an email address via AgentMail to trigger sessions remotely; using last30days before planning to search community discussions and news in parallel; turning repeated tasks into reusable skills to compound agent capabilities. He stresses that human value lies in providing taste and direction, not typing, and warns against AI addiction. The post is packed with copy-paste config snippets and concrete tools, aimed at engineers deep into AI-assisted development.

x.com · 28 min · Agent Infrastructure · Agents · AI Engineering
06-08

How to Master Dynamic Workflows in Claude Code: 6 Patterns and 14 Steps

This article provides a systematic guide to Dynamic Workflows in Claude Code, shipped in late May 2026. It moves beyond manual prompt chaining by letting Claude generate a bespoke JavaScript harness for a specific task. The author first explains the mental model: how workflows structurally fix agentic laziness, self-preferential bias, and goal drift inherent in single-context sessions. It then breaks down six core patterns—classify-and-act, fan-out-and-synthesize, adversarial verification, generate-and-filter, tournament, and loop until done—each with code skeletons. Real-world use cases show how to compose patterns for migrations, deep research, triage, and lightweight evals. Practical controls like /goal, /loop, token budgets, and the quarantine pattern for untrusted inputs are covered. It also advises on saving successful workflows and shipping them as Skills. This guide is for engineers aiming to tackle long-running, parallel, or adversarial tasks beyond a single Claude Code session.

x.com · 17 min · Agents · AI Engineering · Anthropic
06-07

Weekly AI Roundup: Claude Limits Doubled, SpaceX IPO, Microsoft Model Data Contradiction

A roundup of 10 major AI and tech news items from the first week of June 2026. MiniMax M3 was released, beating GPT-5.5 on coding benchmarks at $0.6/M tokens, though independent verification is pending. DeepSeek raised ~$7.4B in its first external funding round, while Unitree completed its IPO review in a record 73 days. Kimi Work, Coze 3.0, and Qwen3.7-Plus all launched new Agent capabilities. Doubao announced subscription plans. ChatGPT surpassed 1 billion monthly active users. Anthropic doubled Claude Cowork's usage limits, secretly filed for an IPO, and published a report stating Claude writes 80% of its own code. NVIDIA unveiled the ARM-based RTX Spark at Computex. SpaceX is set to IPO on June 12, with Google disclosed paying $920M/month for compute. Microsoft's MAI-Thinking-1 faced backlash after its claimed 'clean data' was revealed to include Common Crawl, and GitHub Copilot's switch to metered billing caused developer bills to spike.

mp.weixin.qq.com · 7 min · AI Engineering · AI Industry · Cost Optimization