日刊 · 2026-06-12 — Glean

Fri, Jun 12, 2026 3picks

06:00

Designing loops with Fable 5: self-correction and cross-session memory

如何为 Claude Fable 5 设计循环：自校正与跨会话记忆

R. Lance Martin demonstrates two loop patterns for Anthropic's Fable 5: self-correction and cross-session memory. On the Parameter Golf challenge (train a model under 16MB and 10 minutes on 8xH100s), Fable 5 with CMA and a verifier sub-agent improved the pipeline roughly 6x more than Opus 4.7, favoring structural changes over scalar tuning. On a continual learning SQL benchmark, Fable 5 progressed through fail-investigate-verify-distill into general rules, reaching 73% verification coverage, while Opus 4.7 and Sonnet 4.6 stalled at sparse notes or uncertain schemas. The key takeaway: design loops and environment feedback so the model can hillclimb, rather than relying on direct prompting.

x.com · 5 min · Agent Architecture · Agents · AI Engineering · Anthropic · Context Engineering

06:00

The Missing Link Between Agents and Applications

Headless Tools：让智能体直接在浏览器和桌面应用里执行动作

This article introduces Headless Tools, a mechanism that allows agents to act directly on client-side runtimes such as browsers and desktop applications. The author argues that most current agent tools are server-side, limiting them to API calls while blocking access to browser state, device APIs, and in-app actions. Headless Tools wrap client-side capabilities like geolocation, clipboard, IndexedDB, and application-specific commands as standard tools invocable by the model. The model sees only a tool schema, while the server and client coordinate execution behind the scenes. Code examples in TypeScript demonstrate the pattern, alongside real-world use in a Slidev presentation plugin and browser-local agent memory. Privacy is improved because sensitive data can remain on-device. This is valuable for teams embedding AI agents into rich frontend contexts such as design tools, document editors, and desktop utilities.

x.com · 7 min · AI Agents · AI Engineering · Browser · LangChain · TypeScript

06:00

Training an LLM to Generate Reliable Structured Output Using GRPO and a Reward Function

用奖励函数替代标注数据：GRPO 将 Qwen3-8B 的 JSON 结构输出有效性从 62% 提升至 82%

A hands-on report on replacing labeled data with a code-defined reward function to train structured output. The author fine-tunes Qwen3-8B for JSON invoice extraction using GRPO. Supervised fine-tuning stalls because its token-level loss only optimizes for surface similarity, not structural validity. The fix: a reward function that scores completions 0.0 (invalid JSON), 0.5 (valid JSON but wrong schema), or 1.0 (fully compliant), providing a learning gradient. Training on Fireworks H200s raised schema-valid output from a baseline of 62% to 82% on held-out prompts, exceeding GPT-4.1's 58%, with lower cost and latency. The approach transfers to any task where correctness is verifiable in code, such as SQL, API calls, or tool use. Full reward function, dataset, and training config are provided.

x.com · 12 min · AI Engineering · Fine-tuning · GRPO · Structured Output

A few picks a day.

Designing loops with Fable 5: self-correction and cross-session memory

The Missing Link Between Agents and Applications

Training an LLM to Generate Reliable Structured Output Using GRPO and a Reward Function