标签 · Agents — Glean

Recent picks

88picks · chronological

07-10

Build self-improving agent system with Fable 5 in 14 steps : loops, dynamic workflows, routines

This article provides a detailed 14-step roadmap for building a self-improving agent system using Claude Fable 5. It moves Fable 5 from a prompt-and-close tool to a compounding system: using /goal and Outcomes for self-correcting loops, independent verifier sub-agents over self-critique, state files (STATE.md) and Skills for cross-session memory, and Dynamic Workflows and Routines for long-running autonomy. It includes a cost-capability matrix (Fable 5 for orchestration, Sonnet 4.6 for workers, Haiku 4.5 for graders) and guidance on handling the Mythos safety boundary. Suitable for AI engineers and system designers aiming to leverage Fable 5's days-long autonomous capability.

x.com · 28 min · Agent Engineering · Agents · AI Engineering

07-10

Continual Learning for Agents: Eval, A/B, and Self-Improvement at Replit

This article argues that continual learning for agents isn't limited to weight updates—agents using closed frontier models can improve via harness and context layers. Using Replit Agent as a case study, it details a three-layer evaluation system: ViBench, an offline benchmark for vibe coding that scores apps built from scratch against natural-language PRDs; online A/B tests to capture real user behavior; and Telescope, a trace analysis system that clusters failure patterns. These feed a self-improvement loop that automatically proposes patches (reviewed by engineers). A concrete example shows how a cold-start regression was detected, fixed, and shipped in one day. The piece is relevant for engineers building AI agents and evaluation infrastructure.

x.com · 16 min · Agent Architecture · Agents · AI Engineering

07-08

Loop Patterns in Claude Code: A Practical Guide

The Claude Code team's official blog post introduces four loop modes: turn-based, goal-based, time-based, and proactive loops. It covers how each is triggered, stopped, and when to use them, along with token management and code quality tips. Practical CLI commands and SKILL.md examples are provided. The article emphasizes starting simple and gradually automating repetitive tasks. Essential reading for engineers using or evaluating Claude Code for autonomous development.

x.com · 9 min · Agents · AI Engineering · Ai Tooling

07-08

Harness Engineering for Self-Improvement

This comprehensive survey by Lilian Weng systematically examines the critical role of harness engineering in recursive self-improvement (RSI) for AI systems. A harness is the system layer surrounding a base model that orchestrates execution, context management, tool calling, persistent state, and workflow design. The post synthesizes three design patterns (workflow automation, filesystem as persistent memory, sub-agents and backend jobs) and dives into frontier works: context engineering (ACE, MCE), meta-optimization (Meta-Harness), workflow automation search (ADAS, AFlow), self-improving harnesses (STOP, Self-Harness), and evolutionary search (AlphaEvolve, Darwin Gödel Machine). It concludes with open challenges: weak evaluators, memory management, diversity collapse, reward hacking. Essential reading for AI engineers and agent researchers.

lilianweng.github.io · 42 min · Agent Architecture · Agents · AI Engineering

07-07

A Field Guide to Fable: Finding Your Unknowns

The author shares hands-on experience with Claude Fable for agentic coding, emphasizing that the prompt (map) never fully matches the codebase (territory). He categorizes unknowns into four types (known knowns, known unknowns, unknown knowns, unknown unknowns) and provides practical techniques to systematically discover them: blindspot passes, brainstorming & prototypes, interviews, references, implementation plans, implementation notes, pitches, and quizzes. Ends with a real example of editing the Fable launch video. Suitable for engineers using AI-assisted coding.

x.com · 13 min · Agent Engineering · Agents · AI Engineering

07-03

The Claude Opus 4.8 Setup Guide: How to Get Maximum Quality for Minimum Cost (Exact Config Inside)

A hands-on configuration guide published day after Claude Opus 4.8's release. The core value lies not in benchmark improvements (SWE-bench 87.6% → 88.6%) but in three operational features: Effort Control for per-task reasoning depth, Fast Mode at 3x cheaper than before, and Dynamic Workflows supporting up to 1,000 parallel subagents. The author provides a cost-optimization matrix routing tasks to Haiku/Sonnet/Opus at different effort levels, claiming ~50% monthly savings ($400-600 down to ~$205) for heavy users. Includes copy-paste configs for environment variables and settings.json. Practical for Claude Code users focused on cost control, though the savings claims are unverified estimates.

x.com · 9 min · Agents · Ai Tooling · Claude Code

07-02

Building effective human-agent teams

Anthropic shares four lessons from months of internal testing on building human-agent teams. The shift is from a single-player experience (one human, one AI) to a multiplayer model where agents hold their own credentials, persistent memory, and broad access, joining team channels as full members. The key insights: work in public so agents have context, define clear roles and tool access for every member, set an ambitious north star to make agents proactive, and build trust by granting autonomy gradually. Includes practical examples like agent-led bug backlogs and doer-verifier patterns. A must-read for teams embedding AI agents into collaborative workflows.

claude.com · 16 min · Agent Engineering · Agents · Anthropic

06-29

Context Engineering for AI Agents: The Complete Playbook

This article systematically explains why context engineering is the most critical skill for building reliable AI agents. It argues that agent degradation usually stems from poor context window management rather than model limitations. The context window is likened to RAM, and as tool outputs, retrieval results, and conversation history accumulate, attention thins and the “Lost in the Middle” effect kicks in. Four core strategies are presented: Write (persist information outside context), Select (just-in-time retrieval), Compress (proactively reduce tokens), and Isolate (separate contexts for different jobs). The article details four failure modes—poisoning, distraction, confusion, and clash—and offers concrete evidence: Chroma benchmarks show continuous performance decline well before token limits, RAG‑MCP improved tool selection accuracy from 14% to 43% while halving token usage, and KV‑cache hit rates can yield a 10× cost reduction. A real-world workflow that shipped ~35,000 lines of Rust code in 7 hours using frequent intentional compaction is presented. The target audience is engineers building production‑grade agents.

x.com · 21 min · Agents · AI Engineering · Context Engineering

06-29

Temporary Cloudflare Accounts for AI Agents

Cloudflare introduces temporary accounts for AI agents, enabling deployment via `wrangler deploy --temporary` without manual signup. The accounts last 60 minutes, during which agents can iteratively deploy and developers can permanently claim them. The post addresses the problem of background AI sessions getting stuck at browser-based OAuth flows and explains how the CLI prompts agents about the flag for discovery. A complete TypeScript demo walks through deploying a hello world Worker, modifying it, and redeploying with verification. Partnerships with Stripe and WorkOS are noted as part of broader efforts to reduce agentic deployment friction. Target readers include agent platform builders and developers using coding agents.

x.com · 1 min · Agent Infrastructure · Agents · CLI

06-28

The 5 Levels of Loop Design: From Prompting to Autonomous Agents

The creator of Claude Code says he no longer writes prompts—loops prompt it instead. This post introduces a 5-level progression of human-AI workflow: from Level 1 (single-turn prompting), through Level 2 (manual loop of do-check-correct), Level 3 (verified loop with separate judges for 'done'), Level 4 (self-running loop using /goal command with guardrails), to Level 5 (autonomous systems where loops self-start, run in parallel, and persist lessons into a skill base). Each level comes with a tell and a concrete next step. For developers who still feel they are 'babysitting' their AI agents.

x.com · 7 min · Agent Architecture · Agent Engineering · Agents

06-26

Loop engineering: the 14-step roadmap from prompter to loop designer

This post from @0xCodez on X provides a comprehensive 14-step roadmap for transitioning from manual prompting to designing autonomous loop systems in AI-assisted coding. Based on Anthropic engineering docs, Addy Osmani's essay, and recent studies, it's structured in three tiers: first, a 4-condition test to decide if a loop is warranted; second, five building blocks (automations, worktrees, skills, connectors via MCP, sub-agents with maker-checker split); third, building the minimal viable loop and avoiding failure modes like the 'Ralph Wiggum loop', comprehension debt, and security tax. The author emphasizes that loops are not universal—they only earn their cost when tasks repeat, verification is automated, the token budget can absorb waste, and the agent has senior engineer tools. Ideal for engineers already using coding agents who want to orchestrate them into batched, automated workflows.

x.com · 23 min · Agents · AI Engineering · Ai Tooling

06-24

Loop Engineering: The AI skill every builder needs in 2026

This community-authored article introduces 'Loop Engineering,' arguing that the most effective AI builders are shifting from one-shot prompting to designing automated feedback loops for AI agents. Rather than crafting a perfect prompt, engineers should build systems that discover, plan, execute, verify, and iterate until a verified outcome is reached. It covers six building blocks (automations, worktrees, skills, plugins/connectors, subagents, memory), two loop scales (single-agent vs. fleet), and two types (open vs. closed), while frankly addressing the critical hidden cost of tokens. A practical primer for engineering teams turning AI agents from experiments into production workflows.

x.com · 12 min · Agent Architecture · Agents · AI Engineering

06-23

How To Use Loop Engineering To Build A Self-Improving Quant Trading System

Written by a backend developer working on quant trading systems, this article argues for moving beyond manual prompt-and-wait workflows to building self-running loops. It dissects six universal components of production loops: automation hooks, skill files (SKILL.md), state files (STATE.md), a separate verifier agent, Git worktrees for isolation, and MCP-based connectors. The author then wires these around the five-stage quant trading cycle (data ingestion, signal generation, verification, execution, risk monitoring), with a feedback mechanism that writes lessons back into the skill file for continuous improvement. Targeted at engineers building AI-driven or automated systems, especially in finance.

x.com · 13 min · Agents · Claude Code · Loop Engineering

06-23

30 Core Agentic Engineering Concepts Every Developer Should Know

This article distills 20 foundational concepts in agentic engineering, covering building blocks (Agent loop, Think-Act-Observe, state, patterns), configuration (config files, workflow files, prompt caching, context rot), capability (MCP, live document retrieval, persistent memory), orchestration (subagents, agent loops), guardrails (sandboxing, permissions, hooks, prompt injection defense, pre-commit gates), and observability (tracing, metrics). The author argues that frameworks change but these underlying ideas persist; understanding them makes any new tool familiar. Includes concrete config examples and practical advice (e.g., keep config files under 100 lines, distinguish proxy metrics from outcome metrics).

x.com · 24 min · Agent Architecture · Agents · AI Engineering

06-21

Ponytail: Lazy Senior Dev Inside Your AI Agent, Cuts Code Bloat by ~54%

Ponytail is a rule plugin for 14+ AI coding agents (Claude Code, Codex, Copilot CLI, etc.) that injects a lazy-senior-dev mindset. Before generating code, it forces the agent to climb a ladder: does this need to exist? Can the standard library or native platform feature do it? Can it be one line? Only then writes the minimum viable solution. Benchmarked on real Claude Code sessions editing a real FastAPI + React repository across 12 feature tickets, it cuts lines of code by 54% (mean), tokens by 22%, cost by 20%, and time by 27% while keeping 100% safety on validation, error handling, security, and accessibility. Ideal for developers tired of AI bloat and over-engineering.

github.com · 12 min · Agents · AI Engineering · Code Generation

06-21

A local HTML editor built for human-AI collaboration

Lavish-axi is a local CLI tool that opens AI-generated HTML artifacts in a local browser, allowing developers to annotate elements, select text, take screenshots, and send structured feedback directly back to the AI agent. It runs a local server with a browser chrome, supporting live reload, layout auditing (overflow, clipped text, overlapping text), feedback queuing, and long polling. Built as an AXI, it requires no setup beyond `npx` and can be integrated as a skill into agents like Claude Code. It's ideal for engineers who need to iterate on AI-generated visualizations, plans, or UI mockups with precise feedback.

github.com · 18 min · Agents · Ai Tooling · CLI

06-20

Stop building Foxconn factories for your agents

Garry Tan reflects on his experience building a 540,000-line Rails app, using the Foxconn factory as a metaphor for the dominant AI agent development pattern: wrapping hyper-intelligent models in mountains of code, tests, and guardrails. He argues the economics have inverted—model calls are now cheap and the models are smarter, making the old instinct to ration and control them obsolete. The new paradigm is 'just-in-time software' and 'skill packs,' where lean markdown instructions and minimal TypeScript replace bloated engineering frameworks. A concrete example shows a hackathon judge agent built in an afternoon, doing what previously required a full software project. The essay challenges engineers to abandon the 2013 mental model of measuring capability by lines of code and to embrace 'tokenmaxxing' to gain a 2-3 year competitive advantage. It is aimed at engineers who are coding with AI but still trapped by traditional software metrics and mistrustful architectures.

x.com · 14 min · Agents · Ai Tooling · Code

06-20

Thin Harness, Fat Skills

YC partner Garry Tan argues that the bottleneck in AI agents is not model intelligence but context and process management. He introduces five definitions: Skill files (reusable Markdown procedures), a thin harness (a ~200-line loop for running the model and managing context), resolvers (routing tables that load the right context at the right time), the latent-versus-deterministic boundary (judgment vs. repeatable execution), and diarization (distilling structured intelligence from unstructured data). A real-world example from YC Startup School demonstrates how the same skill file, invoked with different parameters, handles breakout grouping, lunch matching, and real-time pairing, and then improves itself by analyzing mediocre feedback. The piece offers concrete design principles for engineers building agent systems that compound improvements over time.

x.com · 12 min · Agent Architecture · Agents · Ai Tooling

06-20

Skillify: turn every agent failure into a permanent structural fix

Garry Tan presents 'Skillify': a methodology that turns every AI agent failure into a permanent structural fix instead of relying on prompt tweaks or apologies. Using two real failures—an agent bypassing a local script for calendar search and doing mental timezone math—he walks through a 10-step verification checklist: SKILL.md contract, deterministic script, unit tests, integration tests, LLM evals, resolver trigger, resolver eval, reachability audit, smoke test, and brain filing rules. This workflow is built into GBrain, an open-source knowledge engine that ensures agent judgment improves permanently and verifiably. Targeted at developers frustrated by recurring agent mistakes.

x.com · 22 min · Agent Architecture · Agents · Ai-Memory

06-19

Imagine Naked People Were Stupider. Naked Models Are.

YC partner Garry Tan responds to Kyle Kingsbury's anti-AI essay, arguing that Kingsbury's tests of naked models are like testing an engine on a bench and concluding cars are unsafe. The article details the 'thin harness, fat skills' architecture: skill files (reusable Markdown procedures) constrain model input, resolvers (routing tables) dispatch tasks, deterministic code handles precision operations, and testing covers the full pipeline. Using Kingsbury's own bathroom rendering and stock data hallucination examples, Tan shows how architecture can turn unreliable models into reliable systems, and shares a personal resolver that reduced file misfilings from 10/13 to zero. The automotive metaphor concludes: seatbelts, traffic lights, and crumple zones made cars safe, not skepticism of engines. Targeted at engineers building or evaluating AI systems.

x.com · 18 min · Agent Architecture · Agents · Code

06-19

Resolvers: The Routing Table for Intelligence

Garry Tan argues that resolvers—lightweight context routers—are the missing governance layer in agent systems, more crucial than models or skills. Using a mis-filed article as a trigger, he demonstrates how a 200-line resolver replaced 20,000 lines of crammed context, fixing model attention degradation and knowledge base drift. He details a production audit revealing that 10 of 13 skills ignored the resolver, and how he built trigger evals, a "check-resolvable" meta-skill to detect dark capabilities, and a self-healing loop against context rot. The piece reframes resolvers as the organizational chart and management of an agent system, and announces the open-sourcing of his personal architecture (GBrain/GStack) that embodies these patterns. Key evidence: real agent managing 25,000 files and 200 daily inputs, with concrete metrics on skill reachability defects.

x.com · 18 min · Agent Architecture · Agents · Ai-Memory

06-18

Stop Giving Every Agent Its Own Skull

Pejman argues that we are replicating a core human limitation—knowledge siloed in individual brains—inside agent systems. Using OpenClaw, Codex, and Claude Code, each agent retains isolated context about him and his projects. The critical gap is not in the repo's artifacts but in the session itself: the debates, dead ends, and pruned idea branches that markdown cannot capture. With literal physical separation across machines, this fragmentation intensifies. The missing layer is a shared, user-owned memory substrate that transcends agent boundaries. He highlights GBrain and CASS as early signals tackling parts of this problem. The piece resonates with engineers building or deeply integrating multi-agent workflows.

x.com · 7 min · Agent Architecture · Agents · Ai-Memory

06-18

Factory 2.0: From coding agents to software factories

Factory announces its 2.0 release, repositioning from individual AI coding agents to an end-to-end 'software factory'. The post argues that improving individual productivity is no longer enough; enterprises need an interconnected, agent-native system that forms a continuous feedback loop from signals (bug reports, customer feedback) through planning, building, testing, reviewing, securing, shipping, and monitoring. Key design principles include model independence (allowing deliberate model choice or automatic routing per task), sovereign intelligence (data plane and control plane options from cloud to fully air-gapped, with all agent sessions and reviews feeding back into the system), and continual learning and self-improvement across the lifecycle. The article lists customers such as NVIDIA, EY, Adobe, and Palo Alto Networks already running software factories in production. Autonomy is described as a gradual maturation process, using simple Droids, skills, automations, Droid Computers, and multi-agent Missions for different levels of human guidance and agent readiness. The piece is a product launch announcement with some technical concepts, targeting engineers and managers interested in enterprise AI engineering and agent orchestration.

x.com · 5 min · Agent Architecture · Agents · AI Engineering

06-17

Agentic coding and persistent returns to expertise

Anthropic analyzed ~400,000 Claude Code sessions, finding that users make most planning decisions while Claude handles execution. Domain expertise, not coding background, is the key to success: expert-rated sessions achieve verified success over twice as often as novices, though intermediate users capture most of the benefit. Non-software occupations succeed at coding tasks within 5 points of software engineers. Over seven months, the share of debugging sessions fell from 33% to 19%, while end-to-end tasks like deployment, data analysis, and document writing grew, and estimated task value rose ~25%. The report details methodology for decision attribution, expertise classification, and success verification, along with limitations. Suitable for engineers and researchers interested in AI coding tools, agent collaboration, and skill transfer.

www.anthropic.com · 27 min · Agents · AI Engineering · Claude Code

06-16

A Local-First Context Compression Layer for AI Agents: Library, Proxy, and MCP in One Stack

Headroom is a local-first context compression layer built specifically for AI coding agents. It slashes token consumption by 60-95% by compressing tool outputs, logs, files, and RAG results before they reach the LLM, all while maintaining answer accuracy. Usable as a Python/TypeScript library, a transparent proxy, a CLI wrapper for popular agents, or an MCP server, it fits into existing workflows without friction. Internally, it combines JSON structure-aware compression, AST-based code minification, and a custom fine-tuned model, grounded by a novel CCR reversible compression system that guarantees original data is never lost. This tool is ideal for engineers who rely heavily on coding agents and want to cut API costs without altering their current toolchain.

github.com · 18 min · Agents · Ast-Minification · Context Engineering

06-15

Claude Official Cookbooks: Engineering Recipes from RAG to Multimodal Agents

Anthropic's official collection of practical coding recipes for building with Claude. It provides runnable Jupyter notebooks covering capabilities like classification, summarization, and RAG, alongside advanced techniques such as tool use, multimodal vision, and sub-agent orchestration. The latest additions introduce the Claude Agent SDK and Managed Agents, demonstrating how to build observable, hostable agents—from research assistants to SRE bots—with just a few lines of code.

github.com · 8 min · Agents · AI Engineering · Anthropic

06-14

Anthropic's Analytics Agent Stack: Tackling Entity Ambiguity, Staleness, and Retrieval Failure

Anthropic’s data team shares how they use Claude to automate 95% of business analytics queries at roughly 95% accuracy. They identify three core failure modes—concept‑entity ambiguity, data staleness, and retrieval failure—and describe a four‑layer agentic stack to address them: data foundations (canonical datasets, rigorous governance), sources of truth (semantic layer, lineage, business knowledge graph), skills (knowledge and procedural skills, which lifted accuracy from ~21% to >95%), and validation (offline evals, adversarial review, online monitoring). The post includes concrete practices such as colocating docs with code, treating metadata as a first‑class product, and an appendix with a skill file skeleton. It is aimed at data engineers and analysts building LLM‑powered self‑service analytics.

claude.com · 32 min · Agents · AI Engineering · Analytics

06-14

Building cloud agent infrastructure: what's different, and what we learned

A hands-on report from CREAO detailing the architectural challenges of moving AI agents from a single-user desktop to a multi-tenant cloud sandbox. It presents two hard-won lessons. First, decouple slowly-changing user environments from fast-changing platform code by freezing user sandboxes into snapshots and hot-swapping the runner library in ~300ms via an atomic sequence involving chattr, V8 compile cache purging, and post-run re-snapshotting. Second, enforce strict credential isolation by ensuring no long-lived secrets ever enter the sandbox; a host-side API bridge verifies sandbox calls using a dual check of IP allowlisting and short-lived, per-run JWTs, so a compromised agent yields only an expiring, network-pinned token. Concrete commands, validation steps, and design rationale included. Recommended for backend and infrastructure engineers productizing agents in shared environments.

x.com · 10 min · Agents · AI Engineering · Infra

06-14

Hermes Agent: A Self-Improving, Multi-Platform AI Agent Runtime

Hermes Agent is a self-improving AI agent framework with a closed learning loop. It creates skills from experience, manages persistent memory across sessions, and operates over Telegram, Discord, Slack, and CLI via a single gateway. Any LLM backend can be used without code changes, and it runs on a $5 VPS or serverless infrastructure with near-zero idle cost. Built‑in cron scheduling, subagent delegation, and batch trajectory generation make it suitable for engineers and researchers who need an autonomous agent that evolves with use.

github.com · 11 min · Agent-Memory · Agents · CLI

06-13

Claude Agents & Skills for Investment Banking, Research, PE, and Wealth Management

Anthropic's official reference implementation of Claude agents for financial services, offering 9 end-to-end workflow agents for investment banking, research, PE, and wealth management, along with 8 vertical skill packs and 12+ MCP data connectors. Everything is file-based (Markdown/YAML), installable as Cowork plugins or deployable via Managed Agents API. Designed for technical teams who need ready-made finance AI workflows while retaining full customization.

github.com · 19 min · Agents · Anthropic · Financial-Services

06-11

Designing loops with Fable 5: self-correction and cross-session memory

R. Lance Martin demonstrates two loop patterns for Anthropic's Fable 5: self-correction and cross-session memory. On the Parameter Golf challenge (train a model under 16MB and 10 minutes on 8xH100s), Fable 5 with CMA and a verifier sub-agent improved the pipeline roughly 6x more than Opus 4.7, favoring structural changes over scalar tuning. On a continual learning SQL benchmark, Fable 5 progressed through fail-investigate-verify-distill into general rules, reaching 73% verification coverage, while Opus 4.7 and Sonnet 4.6 stalled at sparse notes or uncertain schemas. The key takeaway: design loops and environment feedback so the model can hillclimb, rather than relying on direct prompting.

x.com · 5 min · Agent Architecture · Agents · AI Engineering

06-09

How to Design a Loop That Prompts Your Agent

This article presents a loop architecture that enables an AI agent to autonomously complete multi-step tasks by building an automated prompting system instead of manually crafting each prompt. It breaks down the loop into five parts: defining a 'done' check, building prompts from dynamic state rather than hand-fed instructions, executing actions while capturing all outputs, feeding failures back as the next prompt, and setting hard stop conditions like max turns and cost limits. A walkthrough of fixing a login bug shows the loop in action, emphasizing that real costs come from repeated turns, making guardrails critical. Encapsulating repeated operations into reusable skills is highlighted as the key to long-term value, and common mistakes—like lacking an exit condition or discarding error output—are pointed out. Suitable for developers shifting from one-shot prompts to designing agent control flows.

x.com · 18 min · Agent Architecture · Agents · AI Engineering

06-09

AI Agents: What They Are and How to Build a Telegram Bot with Claude Code

This guide clarifies that AI agents are not a category but a spectrum from simple chat to autonomous loops, defined by tools, memory, and a loop. It then provides a no-code, step-by-step tutorial on building a Telegram bot agent with Claude Code, including system prompt templates, systemd deployment, persistent memory, cost tracking, and practical skills. Also addresses the common memory problem and offers concrete fixes. Suitable for engineers who want a practical agent without writing code themselves.

x.com · 17 min · Agents · Ai Tooling · Claude Code

06-08

Composable Agent Skills for Real Engineering Workflows

Matt Pocock's personal agent skills for Claude Code and Codex, targeting four common failure modes in AI-assisted development: misalignment, verbosity, broken code, and design entropy. Instead of controlling the process, these small, composable skills embed engineering fundamentals—grilling sessions for alignment, shared ubiquous language for concision, TDD red-green-refactor loops for code quality, and architecture rescue tools. They work with any model and are designed to be hacked and adapted in your own .claude directory.

github.com · 14 min · Agents · AI Engineering · Claude Code

06-08

Every Agentic Engineering Hack I Know (June 2026)

The author shares 22 practical hacks for agentic engineering with Claude Code and Codex. The core is a plan-first workflow: use /ce-plan to generate a plan.md that guides the agent; humans skim or ask inline instead of reading it. Hacks include: voice input via Monologue or Wispr Flow (LLMs handle imperfect transcription); running 4-6 separate agent sessions in cmux tabs; defaulting terminal tabs to Claude Code and bypassing all permission prompts with sound alerts on completion; giving Claude an email address via AgentMail to trigger sessions remotely; using last30days before planning to search community discussions and news in parallel; turning repeated tasks into reusable skills to compound agent capabilities. He stresses that human value lies in providing taste and direction, not typing, and warns against AI addiction. The post is packed with copy-paste config snippets and concrete tools, aimed at engineers deep into AI-assisted development.

x.com · 28 min · Agent Infrastructure · Agents · AI Engineering

06-08

How to Master Dynamic Workflows in Claude Code: 6 Patterns and 14 Steps

This article provides a systematic guide to Dynamic Workflows in Claude Code, shipped in late May 2026. It moves beyond manual prompt chaining by letting Claude generate a bespoke JavaScript harness for a specific task. The author first explains the mental model: how workflows structurally fix agentic laziness, self-preferential bias, and goal drift inherent in single-context sessions. It then breaks down six core patterns—classify-and-act, fan-out-and-synthesize, adversarial verification, generate-and-filter, tournament, and loop until done—each with code skeletons. Real-world use cases show how to compose patterns for migrations, deep research, triage, and lightweight evals. Practical controls like /goal, /loop, token budgets, and the quarantine pattern for untrusted inputs are covered. It also advises on saving successful workflows and shipping them as Skills. This guide is for engineers aiming to tackle long-running, parallel, or adversarial tasks beyond a single Claude Code session.

x.com · 17 min · Agents · AI Engineering · Anthropic

06-07

10 Lessons for Writing a Good AGENTS.md for Codex and Claude Code

Ten hard-won lessons from running Codex and Claude Code side by side, distilled into a survival guide for writing AGENTS.md files that actually work. Key moves include capping the root file at 200 lines, listing what not to introduce alongside the actual stack, writing rules the tool can mechanically check instead of vague principles like “keep it simple”, and treating the entry file as a router to architecture docs rather than a single dump. Other high-leverage practices involve using PLANS.md to break long-running tasks into reversible phases inside an isolated worktree, giving high-risk directories their own local guardrails, layering intent–intercept–permission–sandbox so red lines aren't left to model memory alone, storing auditable long-term memory in MEMORY.md with a 30‑day hurdle, and separating personal style from team conventions from machine permissions. The guide closes with a copy‑ready skeleton and the principle that the entry file should grow like a test suite every time the tool gets something wrong.

x.com · 24 min · Agents · AI

06-07

The Orchestration Tax: Why 20 AI Agents Don't Mean 20x Output

Addy Osmani coins 'orchestration tax': spawning agents is cheap, but closing the loop—reviewing, judging, merging—is serial and bounded by your cognitive bandwidth. Using Amdahl's Law and Python's GIL as analogies, he argues you are the single-threaded bottleneck in an otherwise concurrent system. Tactics: cap parallelism to your review rate, split work into delegable vs. judgment-heavy piles, batch reviews to cut context-switch costs, and force agents to self-verify. Aimed at engineers who run multiple AI agents daily and feel busy but unproductive.

x.com · 9 min · Agents · Performance

06-07

Claude API adds auto-caching: single cache_control param cuts input cost to 10%

Anthropic introduced prompt auto-caching in the Claude Messages API. Instead of manually moving breakpoints across conversation turns, a single top-level `cache_control: {type: 'ephemeral'}` auto-places the cache at the last cacheable block. Cached tokens cost 10% of base input price and reduce prefill latency. Ideal for agents and coding assistants where most context remains identical turn-over-turn. The post cites Manus founder @peakji on cache hit rate being the most critical metric for production agents, and links to Claude Code's cache-friendly prompt design insights.

x.com · 5 min · Agents · LLM

06-06

Lessons from Building Claude Code: Prompt Caching Is Everything

Anthropic engineer shares hard-won lessons from optimizing prompt caching in Claude Code. Prompt caching relies on strict prefix matching, so the order of static vs dynamic content is critical — static system prompts, tools, and context must come first. The post reveals counterintuitive pitfalls: don't update the system prompt mid-conversation (pass updates via messages instead), never switch models or modify tool sets mid-session (it invalidates the entire cache), and when compacting context, reuse the parent conversation's prefix to avoid paying full price for tokens. Practical patterns include using tools like EnterPlanMode to model state transitions, deferring tool loading, and running alerts on cache hit rate. A must-read for anyone building long-running agentic products.

x.com · 8 min · Agents · LLM · Performance

06-06

8 proven tips for crafting a CLAUDE.md that truly understands your project

This article distills 8 practical tips for optimizing CLAUDE.md to make Claude Code better aligned with your project: keep it under 200 lines to avoid information overload; maintain a 'do not introduce' list; define actionable coding rules (e.g., use named exports, ban any type); treat CLAUDE.md as a router to other docs, not a library; localize configs for sensitive modules; enforce key rules via hooks; use a MEMORY.md file for cross-session memory; and predefine work style preferences. These insights come from real-world use, backed by concrete examples and contrast cases, targeting engineers who use AI coding assistants.

x.com · 5 min · Agents · AI · LLM

06-06

Why Your AI Agent Is Drowning in Tools (And How Code Mode Saves It)

When an AI agent integrates many MCP tools, it risks context bloat and tool hallucination — 50+ tools can eat 5–7% of the context window. Traditional remedies like agent-side filtering and MCP-side reduction have trade-offs. Code mode lets the LLM search and execute tools via code, slashing token usage, enabling complex control flow, but adding debugging and infrastructure overhead. Cloudflare and Anthropic examples show that the real lesson is to keep a reasonable toolset driven by use cases, not magic numbers.

engineering.leanix.net · 7 min · Agents · Cloudflare · LLM

06-05

What Really Differentiates LLMs Happens After Pretraining: A Full Post-Training Pipeline Breakdown

A comprehensive deep-dive into the full LLM training pipeline, arguing that the real capability gap in 2026 lies not in pretraining but in the post-training stack: instruction tuning, RL, reward design, Agent training, and distillation. The article breaks down the end-to-end process step-by-step — from data recipes and system architecture constraints, through the four-stage post-training pipeline (Cold Start SFT → GRPO-based Reasoning RL → Rejection Sampling FT → Alignment RL), Grader/Reward evaluation loops, Agent training with PARL and Meta-Harness, to distillation and deployment. Key engineering insights include DeepSeek-R1's public recipe, why GRPO simplifies PPO by removing the value network, PRM vs ORM trade-offs, and the shift from optimizing answers to optimizing harness programs. Targeted at engineers who want to trace concrete capability gains back to specific training stages.

tw93.fun · 27 min · Agents · LLM · Performance

06-05

Building an AI Second Brain with Claude and Obsidian: The Complete Tutorial

A hands-on tutorial on connecting Claude to an Obsidian vault, turning your notes into a queryable knowledge engine that reasons over your own context. Covers vault structure (PARA method), AI-first note design, three Claude integration methods (Projects upload, Claude Code direct access, MCP servers), and five ready-to-use workflows (weekly digest, research synthesis, idea connection, knowledge gap auditing, daily briefing). Best for developers, researchers, and knowledge workers building a persistent personal knowledge system.

x.com · 13 min · Agents · AI · Framework

06-04

Multi-Agent Coordination Patterns: Five Approaches and When to Use Them

This post systematically covers five multi-agent coordination patterns: generator-verifier, orchestrator-subagent, agent teams, message bus, and shared state. For each, it explains the mechanism, where it works well, and known struggles (e.g., verifier quality depends on explicit criteria, orchestrator becomes information bottleneck, agent teams require independent subtasks, message bus tracing is hard, shared state risks reactive loops). It recommends starting with the simplest pattern and evolving based on where it struggles, with decision guides comparing patterns side by side. Suitable for engineering teams building multi-agent systems.

claude.com · 19 min · Agents · AI · Framework

06-04

A harness for every task: dynamic workflows in Claude Code

Anthropic engineer Thariq Shihipar details dynamic workflows in Claude Code, where Claude auto-generates custom JavaScript harnesses to orchestrate multiple subagents. It explains how this overcomes single-context-window failures like agentic laziness, self-preferential bias, and goal drift. Common patterns such as classify-and-act, fan-out-and-synthesize, adversarial verification, and tournament are illustrated with concrete use cases from migrations and deep research to root-cause analysis. The post candidly advises that workflows are token-heavy and not needed for routine coding, offering practical tips on token budgets, saving workflows as skills, and pairing with /goal and /loop.

x.com · 15 min · Agent Architecture · Agents · Ai Tooling

06-03

The Kimi K2.6 Blueprint: One-Person Agency at $80k/Month

This thread presents a blueprint for a one-person AI agency using Kimi K2.6, claiming to replace an entire dev team. It details the model's MoE architecture (1T params, 32B activated), SWE-Bench score of 65.8, and the Agent Swarm that runs 300 sub-agents in parallel. It also covers the tech stack (Kimi API, CLI, Swarm, MCP servers, n8n), service offerings (lead gen, knowledge bases, support automation), pricing, client acquisition via job listing monitoring, and a cost model projecting $500/month overhead and $72k+ monthly profit. The content leans heavily promotional, with unverified revenue figures.

x.com · 7 min · Agents · AI · LLM

06-03

The third era of AI software development

Cursor reflects on three eras of AI-assisted coding: from Tab autocomplete, to synchronous agents, to cloud agents autonomously handling hour-long tasks. Internally, 35% of merged PRs now come from these agents, and agent users have surpassed Tab users. The developer's role shifts to problem definition, setting review criteria, and parallel orchestration. Agents return reviewable artifacts—logs, videos, previews—rather than diffs.

x.com · 4 min · Agents · AI

06-03

5 Agent Design Patterns for Long-running AI Agents

Google Cloud presents five design patterns for building AI agents that run up to seven days: checkpoint-and-resume for state durability, delegated human-in-the-loop with zero-cost pausing, layered memory governance (Memory Bank, Profiles, Agent Identity/Registry/Gateway) against drift and leakage, ambient event-driven processing with externalized policies, and fleet orchestration using independently deployed specialists. Each pattern includes ADK code examples and diagrams, addressing production concerns like memory drift. Aimed at developers scaling agents from chatbots to autonomous workers.

x.com · 11 min · Agents · AI

06-03

Meta-Meta-Prompting: The Secret to Making AI Agents Work

Garry Tan, CEO of Y Combinator, presents GBrain, his personal AI agent system built on 100,000 pages of structured knowledge and over 100 modular skills. The core architecture follows a “thin harness, fat skills, fat data” philosophy: a lightweight runtime like OpenClaw routes messages to self-contained skill files, which are themselves created and improved by a meta-skill called Skillify. Tan illustrates the compounding value through the “book-mirror” pipeline, which cross-references a book’s ideas with his actual life events, journal entries, and meeting notes. He details the evolution from an error-prone first version to a reliable workflow using multi-model cross-modal evaluation and deep brain retrieval. Other examples include automated meeting preparation that synthesizes months of accumulated context and entity propagation that updates every related person or company page after a conversation. The article provides a concrete architecture overview, evidence of iterative improvement, and a four-step starting guide for developers building personal compounding AI systems.

x.com · 16 min · Agents · Ai Tooling · Knowledge Graph

06-02

Designing for Agents: Patterns, Feedback, and Context

Ramp’s MCP weekly active users grew 10x in 3 months; Salesforce launched Headless 360, signaling that 80% of software interaction is shifting to agents. The article proposes a new pattern: User → User’s Agent → Software’s Agent → Database, and offers three practical heuristics: proactively teach calling agents how to succeed (like Notion pre-loading a Markdown spec); build feedback loops via required rationale, a feedback tool, and purpose-built seeds; mind the context gap in agent-to-agent interactions by letting each side contribute what it knows best. Essential reading for product teams building agent-native interfaces.

x.com · 10 min · Agents · LLM

06-02

Getting the most out of Codex

This guide shows how to extend Codex from a code assistant to a persistent work system built around durable threads. Readers will learn: using pinned threads with shortcuts (Command-1–9) to preserve context across sessions; voice input for rough ideas; steering and queuing to correct or schedule tasks mid-flight; heartbeat-triggered thread automations (e.g., periodic Slack/Gmail checks); and long-running Goals with test verifiers. The side panel supports inline review of artifacts, while an Obsidian vault serves as shared memory for cross-thread decisions. For engineers integrating AI deeply into their daily workflow.

x.com · 12 min · Agents · AI

06-02

The Anatomy of an Agent Harness

A deep dive into the 12 components of a production-grade agent harness, synthesizing practices from Anthropic, OpenAI, LangChain, and others. It argues that the harness—not the model—determines real-world agent performance, citing evidence like LangChain's 20+ rank jump on TerminalBench and Claude Code's 95% context reduction. Essential reading for engineers building or debugging AI agents.

x.com · 19 min · Agents · AI · LLM

06-02

A Multi-Agent IDE to Run Claude Code, Codex, and Others in Parallel Git Worktrees

Orca is a desktop and mobile IDE designed to run multiple AI coding agents—such as Claude Code, Codex, and Grok—concurrently. It leverages Git's worktree mechanism to give each agent an isolated working directory, eliminating the need for stashing or branch juggling. Users can observe and control all agents from a single interface with tabbed panes, built-in diff review, and direct GitHub Issue/PR integration. It's built for developers who rely on CLI-based coding agents and need to handle multiple features or refactors in parallel.

github.com · 9 min · Agents · Ai Tooling · CLI

06-02

Understand Anything: Turn any codebase into an interactive knowledge graph you can explore

Understand Anything is an open-source tool that turns any codebase into an interactive knowledge graph for exploration, search, and query. Instead of static diagrams, it builds a persistent, navigable knowledge base that integrates with AI coding tools like Claude Code, Cursor, and Codex. It parses code structure and semantic relationships to make logical connections tangible, helping developers quickly onboard legacy systems, locate business logic, or navigate complex codebases.

github.com · 1 min · Agents · LLM

06-01

How to build your own agent harness???

iii proposes an alternative to monolithic agent frameworks: decomposing the harness into a set of independent workers on a shared WebSocket engine. Each of the 15 responsibilities (turn orchestration, auth, policy, approval, session, etc.) is a separate worker that registers functions and triggers on a common bus via iii.trigger(). The architecture makes every layer replaceable—swap the model catalogue by writing a worker that registers models::list, add a provider with provider::<name>::stream, without touching the rest of the stack. The post walks through the turn loop in detail, explaining how provisioning, streaming, tool execution, approval gating, and teardown work across workers. Concrete replacement examples include model catalogue, credential store, approval UI, and policy engine. The entire harness is open source (github.com/iii-hq/workers), with all workers operating under the same protocol, yielding full OpenTelemetry tracing. The key insight: 'build your own harness' means swapping workers, not forking a framework, enabling a slider between thin and thick setups.

x.com · 20 min · Agents · Framework · Workers

06-01

Kimi's Agent Swarm: 300 agents, one prompt, real file outputs.

Kimi's Agent Swarm is an underused multi-agent orchestration system that turns one prompt into real file outputs—resumes, websites, datasets, reports—by coordinating up to 300 domain-specialized agents. This thread by @0xDepressionn shares concrete examples: 100 tailored CVs, a 100,000-word literature review, and 30 landing pages, each replacing thousands of dollars in professional labor. The author distills 15 actionable rules for harnessing Agent Swarm effectively: write project briefs, not questions; batch tasks for leverage; specify output format upfront; attach source files; and save repeatable workflows as Skills. The result is a shift from single-question chatbots to high-volume deliverable generation, making Kimi a cost-effective alternative to expensive services.

x.com · 12 min · Agents · AI · LLM

05-31

Claude Subagents vs. Agent Teams, explained

Compares two Claude multi-agent paradigms: sub-agents (fire-and-forget, isolated context, return compressed results) for embarrassingly parallel tasks, and agent teams (persistent, direct peer communication, shared task list) for ongoing coordination. Provides design principles: decompose by context boundaries, not by roles; start simple and add complexity only when measurable. Covers five orchestration patterns, three situations where multi-agent systems are justified, and common failure modes. Practical advice with code examples for engineers building LLM‑powered agents.

x.com · 11 min · Agents · AI · LLM

05-31

The Complete Guide to /goal, /loop, /schedule & Stop Hooks in Claude Code

A complete guide to four autonomous Claude Code commands that eliminate per‑step babysitting. /goal sets a success condition checked by a fast evaluator after each turn; /loop runs on a fixed time interval; /schedule creates background tasks independent of an open session; Stop Hooks let you programmatically decide when Claude may stop (e.g., via test‑suite scripts). The article provides templates, real examples, condition‑writing rules, and the /goal + Auto mode combination for fully unattended work. It contrasts use‑cases and offers a decision matrix for choosing the right command, turning Claude from a prompt‑driven assistant into an autonomous coding agent.

x.com · 9 min · Agents · AI

05-31

Claude Code /goal: Autonomous Task Completion Without Babysitting

The /goal command in Claude Code enables autonomous task completion by letting it loop turn after turn until a verifiable condition is met. An evaluator model (Claude Haiku by default) checks the transcript each turn. The post explains how to write effective goals (specific, measurable, output-verifiable), project setup tips (CLAUDE.md, hooks, Auto Mode), and common pitfalls like vague goals causing token waste and evaluator hallucination. It compares /goal with /loop and stop hooks. For developers tired of nudging AI, this is a practical guide to hands-off coding sessions.

x.com · 5 min · Agents · AI · LLM

05-31

Building self-improving tax agents with Codex

OpenAI and Thrive Holdings co-developed Tax AI for Crete's accounting firms, using a Codex-driven self-improvement loop. The system processed 7,000 returns with 97% accuracy and 50% throughput increase, cutting one senior accountant's prep time from 180 to 15 hours. The design relies on three pillars: practitioner feedback, production traces, and a Codex iteration cycle. A detailed rental property example shows how practitioner corrections become eval targets, then Codex investigates and proposes fixes. Practical for teams building self-improving agents in expert domains.

openai.com · 15 min · Agents · AI · Performance

05-31

How to Use Claude at 100% — Most People Never Get Past 10%

This guide reveals 17 hidden features of Claude that most users never use, including Projects, Artifacts, Extended Thinking, Memory, Claude in Chrome, Cowork, Scheduled Tasks, Skills, CLAUDE.md, Claude Code, Claude Design, and Prompt Caching. Each feature comes with setup instructions and ready-to-use prompts. Perfect for anyone wanting to turn Claude from a simple chatbot into a full productivity system.

x.com · 16 min · Agents · AI · LLM

05-31

Best Practices for Computer and Browser Use with Claude

Official best practices guide for integrating Claude's computer and browser use capabilities, covering screenshot scaling, click accuracy, cache breakpoints, context management with rolling buffer and server-side compaction, prompt injection defenses, thinking effort tuning, and experimental features like batch tools and the advisor tool. Based on internal testing with Claude 4.6 and Opus 4.7, includes concrete code and performance data.

claude.com · 59 min · Agents · LLM

05-31

Project Glasswing: What Mythos Showed Us

Cloudflare tested Anthropic's Mythos Preview on 50+ internal repos under Project Glasswing. The model excels at chaining low-severity bugs into working exploits and generating PoCs, making validation actionable. Real-world use revealed inconsistent model refusals and signal-to-noise challenges; a generic coding agent proved ineffective. Cloudflare built an eight-stage harness (Recon, Hunt, Validate, Gapfill, Dedupe, Trace, Feedback, Report) using parallel narrow tasks and adversarial review to improve quality. The post argues that beyond faster patching, defenses must limit exploit reachability from the architecture layer.

blog.cloudflare.com · 18 min · Agents · Infra · LLM

05-30

How to Make Claude Code Fix Its Own Mistakes Automatically (Exact Setup You Can Copy)

This article provides a complete, copy-paste-ready setup for making Claude Code automatically catch, fix, and learn from its own mistakes. It covers a self-growing CLAUDE.md for project rules, PostToolUse hooks for auto-formatting and type-checking, Stop hooks for running tests on completion, PreToolUse hooks for blocking dangerous operations, and cross-session memory. The included settings.json config reduces back-and-forth from 45 minutes to 10 unattended minutes per feature. Audience: engineers using Claude Code or AI coding assistants.

x.com · 10 min · Agents · AI

05-30

How I set up Claude to actually get work done

Most people use Claude as a one-off Q&A, losing context each time. The author shares a systematic setup: personal instructions, projects, reference files, a context file, connected tools like email and calendar, templates, and repeatable workflows. 25 concrete steps transform Claude from a chat window into a reusable work environment. Suitable for technical workers frustrated with inconsistent AI responses.

x.com · 9 min · Agents · LLM

05-30

20 AI Concepts You Must Understand in 2026

A beginner-friendly primer covering 20 core AI concepts split into four parts: foundational mechanisms, how LLMs work, how models improve, and how real systems are built. Uses simple analogies and visuals to explain neural networks, transformers, RAG, agents, and more. No code or deep implementation details — a quick reference for building mental models.

x.com · 17 min · Agents · AI · LLM

05-30

Introducing dynamic workflows in Claude Code

Claude Code now supports dynamic workflows, enabling parallel orchestration of tens to hundreds of subagents within a single session for large-scale engineering tasks. It handles end-to-end jobs like codebase-wide bug hunts, migrations across hundreds of files, and security audits. Workflows dynamically plan, fan out, cross-validate, and converge results. Example: Bun was ported from Zig to Rust in 11 days, producing ~750k lines with 99.8% test pass rate. Workflows show plans before execution, can resume after interruption, but consume significantly more tokens. Available in research preview for Max, Team, Enterprise users.

claude.com · 7 min · Agents · AI

05-29

Claude Code Dynamic Workflows: A New Primitive That Moves Orchestration Into Code

Anthropic introduces Dynamic Workflows, a primitive that turns task orchestration into JavaScript scripts executed by a deterministic runtime. The script manages loops, branching, and intermediate results, so only the final answer enters the main Claude context—solving the bottleneck of context overflow and attention dilution when coordinating hundreds of parallel tasks. A deep dive into architecture, primitives, and execution model is paired with a real-world Bun-to-Rust migration (11 days, 750K lines, 99.8% test pass) and a personal 133-session analysis. Comparisons with n8n/Coze/Dify show Workflow's advantage: Turing-complete code offers more expressiveness than visual DAGs, and the orchestration can be generated on‑the‑fly by a model. It shines for codebase audits, large migrations, and adversarial verification but comes with high token costs and current preview limits. Target audience: engineers tackling massive, automated coding tasks.

x.com · 19 min · Agents · AI · Framework

05-29

Prompt → Context → Harness: The Three Paradigms of AI Engineering

AI engineering has undergone three paradigm shifts: from Prompt Engineering (2023–2024) to Context Engineering (2025), and now to Harness Engineering (early 2026). Harness Engineering combines evaluation feedback loops, architectural constraints, and memory governance. Anthropic’s evaluator agent turned a 20‑minute useless artifact into a 6‑hour complete game; OpenAI built a million‑line system with zero human‑written code in five months, enforcing architectural boundaries via CI/linters. Two academic papers fill the memory layer: (S)AGE uses Byzantine‑fault‑tolerant Proof of Experience consensus to double agent calibration accuracy; a longitudinal study shows that 3 lines of prompt plus memory matches 200 lines of expert prompt in performance, yet only the memory group improves over time. Essential for engineers building multi‑agent systems.

x.com · 3 min · Agents · AI · LLM

05-29

10x Your Claude Skills with Karpathy's Autoresearch Method

This article shows how to automatically improve Claude Skills using Karpathy's autoresearch method. It works by giving an agent a yes/no checklist and letting it iteratively test, tweak, and keep only beneficial changes. The author improved a landing page copy skill from 56% to 92% pass rate in four rounds, with visible changelogs. The method applies to any measurable task—define a checklist and let the agent run. Includes download links and practical examples for engineers building AI workflows.

x.com · 5 min · Agents · AI

05-29

ClickStack Observability: MCP Server, AI Notebooks, and ClickStack Cloud

At Open House, ClickHouse announced three major observability updates: ClickStack Cloud (serverless, managed, private preview), AI Notebooks (beta), and an open-source ClickStack MCP server. AI Notebooks replace linear chat with persistent, branchable investigation workspaces, exposing every query and step. The MCP server provides semantic investigative tools to external agents; internal benchmarks show 25% fewer tool calls, 2.5× consistency improvement, and 20% higher evaluation scores vs. raw SQL MCP. The server also supports bi‑directional orchestration: agents can create dashboards and persist results. The design philosophy is “bring your own agents,” with SQL as an escape hatch when pre‑built tools fall short. The post includes setup instructions and a demo. For infrastructure/SRE engineers evaluating ClickHouse-based observability.

clickhouse.com · 15 min · Agents · AI · Database

05-29

Introducing ClickHouse Agent Skills

ClickHouse has released official Agent Skills: an open-source set of 28 prioritized best-practice rules covering schema design, query optimization, and data ingestion, packaged using Anthropic's Agent Skills specification. Users can add them locally with `npx skills add clickhouse/agent-skills`. AI agents (e.g., Claude Code) automatically invoke these rules when appropriate, helping avoid common pitfalls like wrong ORDER BY, non-scalable JOINs, or missing materialized views. The Apache 2.0-licensed repo welcomes community contributions.

clickhouse.com · 3 min · Agents · AI · Database

05-29

Organizational Structure for AI-First in the Harness Era

A podcast interview with Creao's founders explores Harness Engineering—building self-healing, self-improving systems around LLMs. True AI-First companies restructure around AI as the primary producer: development cycles shrink from weeks to a day, product managers are dismantled, and cross-team alignment is automated. Junior engineers adapt faster than seniors; the future rewards architecture + product + marketing generalists. The 'Agent Economy' means content may be produced for AI consumers. A 25-person team rebuilt their architecture in two weeks. Full transcript available.

x.com · 2 min · Agents · AI · Framework

05-28

Agent Unveiled: Principles, Architecture, and Engineering Practices

This article systematically examines the underlying architecture and engineering practices of agent systems. Starting from a stable agent loop, it contrasts workflows with agents, explains five control patterns, and emphasizes that the harness (evaluation baselines, execution boundaries, feedback, and fallbacks) often matters more than the model itself. It details context engineering via layered management and three compression strategies to prevent context rot, ACI‑oriented tool design, a four‑type memory system with consolidation, long‑task state externalization across sessions, protocol‑based multi‑agent coordination, eval frameworks (Pass@k and Pass^k), and event‑driven observability. Finally, it shows how these principles are implemented in OpenClaw, providing a practical reference for engineers building robust agents.

tw93.fun · 31 min · Agents · Framework · LLM

05-28

Deconstructing Claude Code: Architecture, Governance, and Engineering Practices

Based on six months of intensive use of Claude Code, the author breaks down its functionality into six layers (CLAUDE.md/rules/memory, Tools/MCP, Skills, Hooks, Subagents, Verifiers) and provides design principles, anti-patterns, and configuration examples for each. The article focuses on context engineering (token cost structure, layered loading strategy, compaction pitfalls), tool design, Hooks for mandatory enforcement, Subagents for context isolation, prompt caching, and verification loops. Ideal for engineers wanting to move from ad‑hoc chat to a disciplined agent engineering workflow.

tw93.fun · 20 min · Agents · AI

05-28

Beyond the Coding Assistant — A New Series

This free series examines AI-assisted software engineering at enterprise scale. While individual coding speed has skyrocketed, many teams have not seen delivery improve—some have even slowed down. The author argues that current AI coding assistants optimize a single role, but software is shipped by teams with many non-coding roles. The next frontier is lifecycle orchestration, not better code generation. The series is structured in four parts, publishing three times a week with no paywall. It is aimed at engineering leaders, architects, and developers interested in AI engineering.

articles.zimetic.com · 8 min · Agents · AI

05-28

how to build a production grade ai agent

Over 40% of agentic AI projects fail, not because of models, but due to poor risk controls, architecture, and business value. This article presents ten engineering principles: threat modeling, strictly typed tool contracts, least-privilege execution, compact context engineering, governed retrieval, deterministic orchestration, separated memory, reliability mechanics, full observability, and continuous governance. Each principle provides concrete implementation details and real-world numbers (e.g., prompt injection appears in 73% of deployments), guiding teams to build secure, scalable production-grade agents.

x.com · 20 min · Agents · AI · LLM

05-28

The 8 Levels of Agentic Engineering

Bassim Eledath maps the progression of AI-assisted coding into 8 levels, from tab-complete and AI IDEs to context engineering, compounding engineering, MCPs & skills, harness engineering with automated feedback loops, background agents, and autonomous agent teams. Each level builds on the previous, with practical insights on closing the gap between model capability and practice. He argues that plan mode is fading, multi-model dispatching yields better results, and true autonomous teams are still experimental. The piece serves as a roadmap for engineers looking to leverage AI more effectively.

www.bassimeledath.com · 22 min · Agents · AI · LLM

05-28

How Coding Agents Are Reshaping Engineering, Product and Design

Coding agents are fundamentally reshaping the EPD (Engineering, Product, and Design) collaboration model. With the cost of implementation plummeting, the traditional PRD→mockup→code waterfall is dead, replaced by a review-centric process where prototypes are rapidly generated and then scrutinized. Generalists who wield coding agents gain unprecedented leverage; system thinking and product sense become essential for everyone. The bar for specialization rises, and roles converge into either builders or reviewers. Ultimately, anyone with a deep grasp of both product and technology can thrive, blurring traditional role boundaries.

x.com · 12 min · Agents · AI

05-27

The Future Of Software Engineering with Anthropic

A summary of a roundtable on the future of software engineering, featuring leaders from Stripe, NVIDIA, Microsoft, and others. Key insights: closed-loop development creates compounding gains; test-first is the new default; human code review is fading; comments are written for AI readability; long-horizon tasks remain unsolved; developer tooling is being displaced first; hiring values experimentation over raw skill; human-authored context files help, agent-authored ones can hurt. Candid trade-offs and real-world practices are shared.

www.akashbajwa.co · 12 min · Agents · AI · LLM

05-27

5 Agent Skill Design Patterns Every ADK Developer Should Know

With SKILL.md format standardised across 30+ agent tools, the real challenge is content design. This article distills five recurring patterns from ecosystem-wide practices: Tool Wrapper (on-demand library context), Generator (template fill-in for consistent output), Reviewer (checklist scoring by severity), Inversion (agent-led interview before acting), and Pipeline (strict multi-step with gate conditions). Each pattern includes working ADK code, helping developers build reliable agents.

x.com · 13 min · Agents · AI · Framework

05-27

Your Best Prompt Is a Well-Defined User Story

In the age of agentic development, user story quality directly impacts AI output. The article argues teams should invest more time in breaking down stories and writing clear acceptance criteria rather than just estimating story points. A well-defined story includes three parts: Context, Acceptance Criteria, and Technical Hypothesis. Story point estimation is valuable only when forecasting or surfacing team misalignment is needed; otherwise it can be skipped. A good story acts as a good prompt, accelerating development cycles. Relevant for engineering teams using agile/Scrum.

spin.atomicobject.com · 7 min · Agents · LLM

05-27

Dreaming, Outcomes, and Multiagent Orchestration in Claude Managed Agents

Anthropic launches Dreaming in research preview for Claude Managed Agents: a scheduled process that reviews past sessions and memory to extract patterns, enabling agents to self-improve. Outcomes let developers define rubrics with a separate grader for self-correcting work; internal benchmarks show +10pp task success, +8.4% on docx, +10.1% on pptx. Multiagent orchestration allows a lead agent to decompose tasks to specialist subagents running in parallel with shared filesystem and traceability. Case studies include Harvey (6x completion rate improvement), Netflix (parallel log analysis), Spiral (writing quality via outcomes), and Wisedocs (50% faster document reviews). For engineers building autonomous AI agent systems.

claude.com · 6 min · Agents · LLM

05-27

ByteDance TRAE AI Coding Manuals: Context Engineering as Moat

A distilled summary of ByteDance TRAE team's 20 internal AI coding practice manuals. The core argument is that the bottleneck in AI coding efficiency is not model capability but context engineering. The article details six methodologies: Context Engineering, Skills, Spec Coding, Rules, MCP, and Agentic Coding, backed by experimental data (e.g., 32 real bug fixes: 100% success with Skills vs 59% without). Suitable for frontline developers, tech leads, and engineering managers.

x.com · 14 min · Agents · AI · LLM

05-27

From Scratch: Build Automated Claude Code Workflows with Hooks

A tutorial on using Claude Code Hooks to automate shell commands at lifecycle events, replacing unreliable prompt instructions. Covers 5 key events (PostToolUse, PreToolUse, etc.), 3 hook types (command/prompt/agent), and config file structure. Provides 5 ready-to-use examples: desktop notification, auto-formatting, file protection, context recovery after compaction, and commit message linting. Exit code 2 blocks dangerous actions and feeds stderr back to Claude. For developers seeking reliable Claude Code workflows.

x.com · 10 min · Agents · AI

05-27

Claude Code in Large Codebases: Best Practices and Getting Started

This article covers how Claude Code navigates large codebases using agentic search instead of RAG indexing, avoiding stale index issues but requiring good context configuration. It details the 'harness' ecosystem around the model—CLAUDE.md, Hooks, Skills, Plugins, MCP servers, LSP integration, and subagents—and presents three configuration patterns from successful deployments: making the codebase navigable, maintaining CLAUDE.md as models evolve, and assigning ownership for rollout. A practical guide for teams adopting Claude Code at scale.

claude.com · 19 min · Agents · AI

05-27

Why Your “AI-First” Strategy Is Probably Wrong

The CTO of an agent platform shares their journey of rebuilding the entire engineering workflow around AI: 99% of production code is written by AI, shipping features within a day. The article critiques the superficial “AI-assisted” approach and introduces “harness engineering,” detailing their tech stack, self-healing feedback loop, and the new engineer roles of Architect and Operator. Real-world results include 3–8 deployments per day. Valuable for teams and CTOs seeking genuine AI integration.

x.com · 19 min · Agents · AI