标签 · AI — Glean

Recent picks

52picks · chronological

07-06

Better Models: Worse Tools

Pi author discovers that Anthropic's Opus 4.8 and Sonnet 5 inject spurious keys (requireUnique, oldText2, cost, etc.) into the edits[] array of Pi's edit tool, while older models do not. The failure is context-dependent and reproducible in agentic sessions. The post dissects Anthropic's tool calling internals: ANTLM markers, JSON-serialized nested arrays, and Claude Code's extremely forgiving harness that silently filters unknown keys and retries malformed calls. Author hypothesizes that RL post-training over Claude Code's flat old/new_string schema creates a strong prior, making newer models worse at following non-canonical tool schemas. Strict tool invocation fixes the issue, but Anthropic's complexity limits prevent Claude Code from using it. Key takeaway: tool schemas are not distribution-neutral; any harness must inherit Claude Code's quirks.

lucumr.pocoo.org · 14 min · Agent Engineering · AI · Claude Code

06-22

GLM-5.2: Built for Long-Horizon Tasks

Zhipu AI introduces GLM-5.2, a flagship model for long-horizon tasks with a solid 1M-token context and an MIT license. Architecture innovations include IndexShare, which reuses the sparse attention indexer across four layers to cut per-token FLOPs by 2.9× at 1M context, and an improved MTP layer that raises speculative decoding acceptance length by 20% through IndexShare, KV sharing, rejection sampling, and end-to-end TV loss. Agentic RL post-training is backed by the slime framework, and an anti-hack module detects and blocks reward-hacking behaviors like fetching evaluation files or curl-downloading answers. GLM-5.2 ranks as the top open-source model on long-horizon benchmarks such as FrontierSWE (only 1% behind Opus 4.8) and Terminal-Bench 2.1 (81.0), making it relevant for engineers building coding agents and long-context inference systems.

z.ai · 21 min · Agent Architecture · AI · AI Engineering

06-19

On the LOC controversy: doing the math on a 810x developer output increase

Garry Tan, CEO of Y Combinator, responds to criticism of his claim of shipping 600,000 lines of production code in 60 days. He concedes LOC is a flawed metric but provides a rigorous before-and-after comparison: in 2013, as a part-time coder, he averaged 14 logical lines per day; in 2026, with the same day job, he now averages 11,417. Even after deflation for logical SLOC and an aggressive 2x AI-verbosity factor, the daily rate is 5,708 lines – a 408x increase. Quality data is provided: 2.0% revert rate, 6.3% fix commits, and a test suite that grew from 100 to over 2,000 tests. He details his testing infrastructure (Playwright-based browser CLI, slop-scan), product traction (75k GitHub stars, ~7k WAU), and argues the real shift is the collapse of the “idea to shipping" cycle from weeks to hours. The core argument is that the productivity ground has shifted for all engineers, not just him.

x.com · 12 min · AI · Ai Tooling · Claude Code

06-07

10 Lessons for Writing a Good AGENTS.md for Codex and Claude Code

Ten hard-won lessons from running Codex and Claude Code side by side, distilled into a survival guide for writing AGENTS.md files that actually work. Key moves include capping the root file at 200 lines, listing what not to introduce alongside the actual stack, writing rules the tool can mechanically check instead of vague principles like “keep it simple”, and treating the entry file as a router to architecture docs rather than a single dump. Other high-leverage practices involve using PLANS.md to break long-running tasks into reversible phases inside an isolated worktree, giving high-risk directories their own local guardrails, layering intent–intercept–permission–sandbox so red lines aren't left to model memory alone, storing auditable long-term memory in MEMORY.md with a 30‑day hurdle, and separating personal style from team conventions from machine permissions. The guide closes with a copy‑ready skeleton and the principle that the entry file should grow like a test suite every time the tool gets something wrong.

x.com · 24 min · Agents · AI

06-06

8 proven tips for crafting a CLAUDE.md that truly understands your project

This article distills 8 practical tips for optimizing CLAUDE.md to make Claude Code better aligned with your project: keep it under 200 lines to avoid information overload; maintain a 'do not introduce' list; define actionable coding rules (e.g., use named exports, ban any type); treat CLAUDE.md as a router to other docs, not a library; localize configs for sensitive modules; enforce key rules via hooks; use a MEMORY.md file for cross-session memory; and predefine work style preferences. These insights come from real-world use, backed by concrete examples and contrast cases, targeting engineers who use AI coding assistants.

x.com · 5 min · Agents · AI · LLM

06-05

AI Amplifies Output, Not Input: My /learn Workflow for Deep Technical Dives

The author shares a personal workflow for deep learning in the AI era: treat learning like coding, structured as collect → filter → outline → draft → AI-assisted tightening → self-review. The core argument is that AI's real value lies in amplifying your output, not in summarizing input. Using a recent deep dive into LLM training as an example, the post introduces the /learn skill in the open-source Waza toolkit to industrialize this process. Recommended for engineers wondering how to maintain depth while leveraging AI.

tw93.fun · 2 min · AI · Framework

06-05

Building an AI Second Brain with Claude and Obsidian: The Complete Tutorial

A hands-on tutorial on connecting Claude to an Obsidian vault, turning your notes into a queryable knowledge engine that reasons over your own context. Covers vault structure (PARA method), AI-first note design, three Claude integration methods (Projects upload, Claude Code direct access, MCP servers), and five ready-to-use workflows (weekly digest, research synthesis, idea connection, knowledge gap auditing, daily briefing). Best for developers, researchers, and knowledge workers building a persistent personal knowledge system.

x.com · 13 min · Agents · AI · Framework

06-04

How to Stop Hitting Claude Usage Limits: 23 Token-Saving Habits

A personal guide of 23 habits to reduce Claude token usage, based on author's experience and Anthropic docs. Includes converting files before upload, planning in Chat before building files, using edit instead of follow-ups, and voice-to-text for richer prompts. Helps paid users go from daily limits to hitting them once a month. For heavy Claude users.

x.com · 17 min · AI · Performance

06-04

Multi-Agent Coordination Patterns: Five Approaches and When to Use Them

This post systematically covers five multi-agent coordination patterns: generator-verifier, orchestrator-subagent, agent teams, message bus, and shared state. For each, it explains the mechanism, where it works well, and known struggles (e.g., verifier quality depends on explicit criteria, orchestrator becomes information bottleneck, agent teams require independent subtasks, message bus tracing is hard, shared state risks reactive loops). It recommends starting with the simplest pattern and evolving based on where it struggles, with decision guides comparing patterns side by side. Suitable for engineering teams building multi-agent systems.

claude.com · 19 min · Agents · AI · Framework

06-03

Turn Claude into a Consistent Assistant with CLAUDE.md: 21 Essential Instructions

Every new Claude session starts with zero memory, forcing you to re-explain preferences and correct the same mistakes. CLAUDE.md is a persistent instruction file that Claude automatically reads, providing context, voice, and behavioral rules from the very first message. This guide presents 21 practical instructions grouped into communication style, behavior constraints, personal context, session memory, and developer-specific safeguards. Each instruction includes the rationale and a ready-to-use snippet. By creating a CLAUDE.md file with even a few of these rules, you can dramatically improve output consistency and save hours each week. Ideal for engineers, writers, and anyone who uses Claude professionally.

x.com · 15 min · AI · LLM

06-03

The Kimi K2.6 Blueprint: One-Person Agency at $80k/Month

This thread presents a blueprint for a one-person AI agency using Kimi K2.6, claiming to replace an entire dev team. It details the model's MoE architecture (1T params, 32B activated), SWE-Bench score of 65.8, and the Agent Swarm that runs 300 sub-agents in parallel. It also covers the tech stack (Kimi API, CLI, Swarm, MCP servers, n8n), service offerings (lead gen, knowledge bases, support automation), pricing, client acquisition via job listing monitoring, and a cost model projecting $500/month overhead and $72k+ monthly profit. The content leans heavily promotional, with unverified revenue figures.

x.com · 7 min · Agents · AI · LLM

06-03

The third era of AI software development

Cursor reflects on three eras of AI-assisted coding: from Tab autocomplete, to synchronous agents, to cloud agents autonomously handling hour-long tasks. Internally, 35% of merged PRs now come from these agents, and agent users have surpassed Tab users. The developer's role shifts to problem definition, setting review criteria, and parallel orchestration. Agents return reviewable artifacts—logs, videos, previews—rather than diffs.

x.com · 4 min · Agents · AI

06-03

5 Agent Design Patterns for Long-running AI Agents

Google Cloud presents five design patterns for building AI agents that run up to seven days: checkpoint-and-resume for state durability, delegated human-in-the-loop with zero-cost pausing, layered memory governance (Memory Bank, Profiles, Agent Identity/Registry/Gateway) against drift and leakage, ambient event-driven processing with externalized policies, and fleet orchestration using independently deployed specialists. Each pattern includes ADK code examples and diagrams, addressing production concerns like memory drift. Aimed at developers scaling agents from chatbots to autonomous workers.

x.com · 11 min · Agents · AI

06-02

Getting the most out of Codex

This guide shows how to extend Codex from a code assistant to a persistent work system built around durable threads. Readers will learn: using pinned threads with shortcuts (Command-1–9) to preserve context across sessions; voice input for rough ideas; steering and queuing to correct or schedule tasks mid-flight; heartbeat-triggered thread automations (e.g., periodic Slack/Gmail checks); and long-running Goals with test verifiers. The side panel supports inline review of artifacts, while an Obsidian vault serves as shared memory for cross-thread decisions. For engineers integrating AI deeply into their daily workflow.

x.com · 12 min · Agents · AI

06-02

The Anatomy of an Agent Harness

A deep dive into the 12 components of a production-grade agent harness, synthesizing practices from Anthropic, OpenAI, LangChain, and others. It argues that the harness—not the model—determines real-world agent performance, citing evidence like LangChain's 20+ rank jump on TerminalBench and Claude Code's 95% context reduction. Essential reading for engineers building or debugging AI agents.

x.com · 19 min · Agents · AI · LLM

06-01

Kimi's Agent Swarm: 300 agents, one prompt, real file outputs.

Kimi's Agent Swarm is an underused multi-agent orchestration system that turns one prompt into real file outputs—resumes, websites, datasets, reports—by coordinating up to 300 domain-specialized agents. This thread by @0xDepressionn shares concrete examples: 100 tailored CVs, a 100,000-word literature review, and 30 landing pages, each replacing thousands of dollars in professional labor. The author distills 15 actionable rules for harnessing Agent Swarm effectively: write project briefs, not questions; batch tasks for leverage; specify output format upfront; attach source files; and save repeatable workflows as Skills. The result is a shift from single-question chatbots to high-volume deliverable generation, making Kimi a cost-effective alternative to expensive services.

x.com · 12 min · Agents · AI · LLM

06-01

Orchestrating AI Code Review at Scale

Cloudflare built an AI code review system on OpenCode, orchestrating up to 7 domain-specific agents (security, performance, docs, etc.) via a coordinator. Over 30 days it processed 131k+ reviews with a median latency of 3m39s and average cost of $1.19. The post dives deep into plugin architecture, risk tiers, circuit breakers, incremental re-reviews, prompt injection prevention, and honest limitations. Suitable for engineers exploring AI-assisted development and CI/CD integration at scale.

blog.cloudflare.com · 51 min · AI · Cloudflare · LLM

05-31

Andrej Karpathy says 99% of AI users miss 7 basics. Full breakdown.

Andrej Karpathy — OpenAI co-founder, former Tesla AI head — argues the bottleneck for most AI users isn't the model or the prompt, but the lack of a system around it. This breakdown covers his 7 practical rules: provide full context instead of magic prompts; curate a proper CLAUDE.md; adopt a /raw, /wiki, and config three-layer memory; permanently save strong outputs as reference pages; maintain index.md and log.md for long projects; treat AI as a super-intern with no taste, working in small verified steps; and add one line to render research as navigable HTML. Aimed at engineers stuck in prompt tweaking loops, these habits take an afternoon to set up and compound fast.

x.com · 8 min · AI · LLM

05-31

Claude Subagents vs. Agent Teams, explained

Compares two Claude multi-agent paradigms: sub-agents (fire-and-forget, isolated context, return compressed results) for embarrassingly parallel tasks, and agent teams (persistent, direct peer communication, shared task list) for ongoing coordination. Provides design principles: decompose by context boundaries, not by roles; start simple and add complexity only when measurable. Covers five orchestration patterns, three situations where multi-agent systems are justified, and common failure modes. Practical advice with code examples for engineers building LLM‑powered agents.

x.com · 11 min · Agents · AI · LLM

05-31

The Complete Guide to /goal, /loop, /schedule & Stop Hooks in Claude Code

A complete guide to four autonomous Claude Code commands that eliminate per‑step babysitting. /goal sets a success condition checked by a fast evaluator after each turn; /loop runs on a fixed time interval; /schedule creates background tasks independent of an open session; Stop Hooks let you programmatically decide when Claude may stop (e.g., via test‑suite scripts). The article provides templates, real examples, condition‑writing rules, and the /goal + Auto mode combination for fully unattended work. It contrasts use‑cases and offers a decision matrix for choosing the right command, turning Claude from a prompt‑driven assistant into an autonomous coding agent.

x.com · 9 min · Agents · AI

05-31

Claude Code /goal: Autonomous Task Completion Without Babysitting

The /goal command in Claude Code enables autonomous task completion by letting it loop turn after turn until a verifiable condition is met. An evaluator model (Claude Haiku by default) checks the transcript each turn. The post explains how to write effective goals (specific, measurable, output-verifiable), project setup tips (CLAUDE.md, hooks, Auto Mode), and common pitfalls like vague goals causing token waste and evaluator hallucination. It compares /goal with /loop and stop hooks. For developers tired of nudging AI, this is a practical guide to hands-off coding sessions.

x.com · 5 min · Agents · AI · LLM

05-31

Building self-improving tax agents with Codex

OpenAI and Thrive Holdings co-developed Tax AI for Crete's accounting firms, using a Codex-driven self-improvement loop. The system processed 7,000 returns with 97% accuracy and 50% throughput increase, cutting one senior accountant's prep time from 180 to 15 hours. The design relies on three pillars: practitioner feedback, production traces, and a Codex iteration cycle. A detailed rental property example shows how practitioner corrections become eval targets, then Codex investigates and proposes fixes. Practical for teams building self-improving agents in expert domains.

openai.com · 15 min · Agents · AI · Performance

05-31

How to Use Claude at 100% — Most People Never Get Past 10%

This guide reveals 17 hidden features of Claude that most users never use, including Projects, Artifacts, Extended Thinking, Memory, Claude in Chrome, Cowork, Scheduled Tasks, Skills, CLAUDE.md, Claude Code, Claude Design, and Prompt Caching. Each feature comes with setup instructions and ready-to-use prompts. Perfect for anyone wanting to turn Claude from a simple chatbot into a full productivity system.

x.com · 16 min · Agents · AI · LLM

05-30

CLAUDE.md Guide: 21 Instructions to Lock In Preferences and Context

Most Claude users don't know about CLAUDE.md — a plain-text file placed in a project folder that Claude reads automatically at the start of every session, permanently setting your preferences, context, and behavioral rules. This guide provides 21 concrete instructions across five parts: communication style (no filler, admit uncertainty, match length to task), behavior (ask before big changes, only change what was requested, summarize changes), user context (background, project, writing voice), memory & continuity (log decisions in MEMORY.md, session summaries, track failures), and developer-specific rules including Andrej Karpathy's 4 golden rules (don't assume, simplest solution, don't touch unrelated code, flag uncertainty), which reportedly boosted coding accuracy from 65% to 94%. For anyone who wants to stop repeating themselves and get more consistent, on-brand output from Claude.

x.com · 15 min · AI · LLM

05-30

How to Make Claude Code Fix Its Own Mistakes Automatically (Exact Setup You Can Copy)

This article provides a complete, copy-paste-ready setup for making Claude Code automatically catch, fix, and learn from its own mistakes. It covers a self-growing CLAUDE.md for project rules, PostToolUse hooks for auto-formatting and type-checking, Stop hooks for running tests on completion, PreToolUse hooks for blocking dangerous operations, and cross-session memory. The included settings.json config reduces back-and-forth from 45 minutes to 10 unattended minutes per feature. Audience: engineers using Claude Code or AI coding assistants.

x.com · 10 min · Agents · AI

05-30

20 AI Concepts You Must Understand in 2026

A beginner-friendly primer covering 20 core AI concepts split into four parts: foundational mechanisms, how LLMs work, how models improve, and how real systems are built. Uses simple analogies and visuals to explain neural networks, transformers, RAG, agents, and more. No code or deep implementation details — a quick reference for building mental models.

x.com · 17 min · Agents · AI · LLM

05-30

Introducing dynamic workflows in Claude Code

Claude Code now supports dynamic workflows, enabling parallel orchestration of tens to hundreds of subagents within a single session for large-scale engineering tasks. It handles end-to-end jobs like codebase-wide bug hunts, migrations across hundreds of files, and security audits. Workflows dynamically plan, fan out, cross-validate, and converge results. Example: Bun was ported from Zig to Rust in 11 days, producing ~750k lines with 99.8% test pass rate. Workflows show plans before execution, can resume after interruption, but consume significantly more tokens. Available in research preview for Max, Team, Enterprise users.

claude.com · 7 min · Agents · AI

05-29

Claude Code Dynamic Workflows: A New Primitive That Moves Orchestration Into Code

Anthropic introduces Dynamic Workflows, a primitive that turns task orchestration into JavaScript scripts executed by a deterministic runtime. The script manages loops, branching, and intermediate results, so only the final answer enters the main Claude context—solving the bottleneck of context overflow and attention dilution when coordinating hundreds of parallel tasks. A deep dive into architecture, primitives, and execution model is paired with a real-world Bun-to-Rust migration (11 days, 750K lines, 99.8% test pass) and a personal 133-session analysis. Comparisons with n8n/Coze/Dify show Workflow's advantage: Turing-complete code offers more expressiveness than visual DAGs, and the orchestration can be generated on‑the‑fly by a model. It shines for codebase audits, large migrations, and adversarial verification but comes with high token costs and current preview limits. Target audience: engineers tackling massive, automated coding tasks.

x.com · 19 min · Agents · AI · Framework

05-29

Context Engineering Is Replacing Prompt Engineering. Here's How It Works

The author argues that prompt engineering is giving way to 'context engineering'—building the environment of information (identity, knowledge, memory, tools, processes) that enables a model to produce consistent results with minimal prompting. A five-layer framework is detailed, with practical steps for Claude users: set custom instructions, upload knowledge files, actively craft memory, connect MCP tools, and encode processes as Skills. The piece is opinionated and lacks empirical evidence but offers actionable guidance for those heavily using Claude.

x.com · 12 min · AI · Framework · LLM

05-29

Prompt → Context → Harness: The Three Paradigms of AI Engineering

AI engineering has undergone three paradigm shifts: from Prompt Engineering (2023–2024) to Context Engineering (2025), and now to Harness Engineering (early 2026). Harness Engineering combines evaluation feedback loops, architectural constraints, and memory governance. Anthropic’s evaluator agent turned a 20‑minute useless artifact into a 6‑hour complete game; OpenAI built a million‑line system with zero human‑written code in five months, enforcing architectural boundaries via CI/linters. Two academic papers fill the memory layer: (S)AGE uses Byzantine‑fault‑tolerant Proof of Experience consensus to double agent calibration accuracy; a longitudinal study shows that 3 lines of prompt plus memory matches 200 lines of expert prompt in performance, yet only the memory group improves over time. Essential for engineers building multi‑agent systems.

x.com · 3 min · Agents · AI · LLM

05-29

10x Your Claude Skills with Karpathy's Autoresearch Method

This article shows how to automatically improve Claude Skills using Karpathy's autoresearch method. It works by giving an agent a yes/no checklist and letting it iteratively test, tweak, and keep only beneficial changes. The author improved a landing page copy skill from 56% to 92% pass rate in four rounds, with visible changelogs. The method applies to any measurable task—define a checklist and let the agent run. Includes download links and practical examples for engineers building AI workflows.

x.com · 5 min · Agents · AI

05-29

Introducing React Best Practices: A Structured Repo for AI Agents

Vercel distills 10+ years of React and Next.js optimization into a structured repo with 40+ rules across 8 categories, ordered by impact from eliminating waterfalls to JavaScript micro-optimizations. Each rule includes code samples and impact ratings, and compiles into an AGENTS.md document consumable by AI coding agents.

vercel.com · 6 min · AI · Performance · React

05-29

ClickStack Observability: MCP Server, AI Notebooks, and ClickStack Cloud

At Open House, ClickHouse announced three major observability updates: ClickStack Cloud (serverless, managed, private preview), AI Notebooks (beta), and an open-source ClickStack MCP server. AI Notebooks replace linear chat with persistent, branchable investigation workspaces, exposing every query and step. The MCP server provides semantic investigative tools to external agents; internal benchmarks show 25% fewer tool calls, 2.5× consistency improvement, and 20% higher evaluation scores vs. raw SQL MCP. The server also supports bi‑directional orchestration: agents can create dashboards and persist results. The design philosophy is “bring your own agents,” with SQL as an escape hatch when pre‑built tools fall short. The post includes setup instructions and a demo. For infrastructure/SRE engineers evaluating ClickHouse-based observability.

clickhouse.com · 15 min · Agents · AI · Database

05-29

A.I. Should Elevate Your Thinking, Not Replace It

Software engineering is splitting into two groups: those who use AI to remove drudgery and invest in higher-level thinking, and those who outsource their reasoning to AI, simulating competence without building it. This 'outsourced thinking' is a new failure mode that erodes judgment over time. The real value of engineers lies in framing problems, making tradeoffs, and creating clarity—skills AI cannot own. Early-career engineers are especially at risk of skipping essential skill formation. Leadership must learn to differentiate polished output from genuine technical depth. The article argues that organizational health depends on recognizing this divide.

www.koshyjohn.com · 11 min · AI

05-29

Introducing ClickHouse Agent Skills

ClickHouse has released official Agent Skills: an open-source set of 28 prioritized best-practice rules covering schema design, query optimization, and data ingestion, packaged using Anthropic's Agent Skills specification. Users can add them locally with `npx skills add clickhouse/agent-skills`. AI agents (e.g., Claude Code) automatically invoke these rules when appropriate, helping avoid common pitfalls like wrong ORDER BY, non-scalable JOINs, or missing materialized views. The Apache 2.0-licensed repo welcomes community contributions.

clickhouse.com · 3 min · Agents · AI · Database

05-29

Organizational Structure for AI-First in the Harness Era

A podcast interview with Creao's founders explores Harness Engineering—building self-healing, self-improving systems around LLMs. True AI-First companies restructure around AI as the primary producer: development cycles shrink from weeks to a day, product managers are dismantled, and cross-team alignment is automated. Junior engineers adapt faster than seniors; the future rewards architecture + product + marketing generalists. The 'Agent Economy' means content may be produced for AI consumers. A 25-person team rebuilt their architecture in two weeks. Full transcript available.

x.com · 2 min · Agents · AI · Framework

05-28

Deconstructing Claude Code: Architecture, Governance, and Engineering Practices

Based on six months of intensive use of Claude Code, the author breaks down its functionality into six layers (CLAUDE.md/rules/memory, Tools/MCP, Skills, Hooks, Subagents, Verifiers) and provides design principles, anti-patterns, and configuration examples for each. The article focuses on context engineering (token cost structure, layered loading strategy, compaction pitfalls), tool design, Hooks for mandatory enforcement, Subagents for context isolation, prompt caching, and verification loops. Ideal for engineers wanting to move from ad‑hoc chat to a disciplined agent engineering workflow.

tw93.fun · 20 min · Agents · AI

05-28

Beyond the Coding Assistant — A New Series

This free series examines AI-assisted software engineering at enterprise scale. While individual coding speed has skyrocketed, many teams have not seen delivery improve—some have even slowed down. The author argues that current AI coding assistants optimize a single role, but software is shipped by teams with many non-coding roles. The next frontier is lifecycle orchestration, not better code generation. The series is structured in four parts, publishing three times a week with no paywall. It is aimed at engineering leaders, architects, and developers interested in AI engineering.

articles.zimetic.com · 8 min · Agents · AI

05-28

CSS Refactoring with an AI Safety Net

The author refactored a tangled CSS codebase into a clean architecture using Claude Code and Playwright, ensuring zero visual changes across seven phases. A Playwright script captured 9 app states, and after each phase, Claude compared screenshots to baseline, catching regressions like a line-height shift. The result: layered CSS with modern reset, unified button classes, and CSS variables. The post details state enumeration, script writing, and AI-driven diffing, and discusses trade-offs with dedicated tools. Essential reading for front-end developers tackling legacy CSS.

danielabaron.me · 12 min · AI · Framework

05-28

Andrej Karpathy wrote something that every Claude Code user has felt b

Andrej Karpathy's three observations about LLM behavior—making silent assumptions, overcomplicating code, and performing careless side effects—inspired a single CLAUDE.md file with four principles: think before coding, prioritize simplicity, make surgical changes, and execute goal-driven. Each principle directly addresses a specific pain point. The file is ready to drop into any project to guide AI coding assistants toward more disciplined output. For every Claude Code user who has experienced these issues but struggled to articulate them.

x.com · 2 min · AI · LLM

05-28

how to build a production grade ai agent

Over 40% of agentic AI projects fail, not because of models, but due to poor risk controls, architecture, and business value. This article presents ten engineering principles: threat modeling, strictly typed tool contracts, least-privilege execution, compact context engineering, governed retrieval, deterministic orchestration, separated memory, reliability mechanics, full observability, and continuous governance. Each principle provides concrete implementation details and real-world numbers (e.g., prompt injection appears in 73% of deployments), guiding teams to build secure, scalable production-grade agents.

x.com · 20 min · Agents · AI · LLM

05-28

The 8 Levels of Agentic Engineering

Bassim Eledath maps the progression of AI-assisted coding into 8 levels, from tab-complete and AI IDEs to context engineering, compounding engineering, MCPs & skills, harness engineering with automated feedback loops, background agents, and autonomous agent teams. Each level builds on the previous, with practical insights on closing the gap between model capability and practice. He argues that plan mode is fading, multi-model dispatching yields better results, and true autonomous teams are still experimental. The piece serves as a roadmap for engineers looking to leverage AI more effectively.

www.bassimeledath.com · 22 min · Agents · AI · LLM

05-28

How Coding Agents Are Reshaping Engineering, Product and Design

Coding agents are fundamentally reshaping the EPD (Engineering, Product, and Design) collaboration model. With the cost of implementation plummeting, the traditional PRD→mockup→code waterfall is dead, replaced by a review-centric process where prototypes are rapidly generated and then scrutinized. Generalists who wield coding agents gain unprecedented leverage; system thinking and product sense become essential for everyone. The bar for specialization rises, and roles converge into either builders or reviewers. Ultimately, anyone with a deep grasp of both product and technology can thrive, blurring traditional role boundaries.

x.com · 12 min · Agents · AI

05-27

The Future Of Software Engineering with Anthropic

A summary of a roundtable on the future of software engineering, featuring leaders from Stripe, NVIDIA, Microsoft, and others. Key insights: closed-loop development creates compounding gains; test-first is the new default; human code review is fading; comments are written for AI readability; long-horizon tasks remain unsolved; developer tooling is being displaced first; hiring values experimentation over raw skill; human-authored context files help, agent-authored ones can hurt. Candid trade-offs and real-world practices are shared.

www.akashbajwa.co · 12 min · Agents · AI · LLM

05-27

5 Agent Skill Design Patterns Every ADK Developer Should Know

With SKILL.md format standardised across 30+ agent tools, the real challenge is content design. This article distills five recurring patterns from ecosystem-wide practices: Tool Wrapper (on-demand library context), Generator (template fill-in for consistent output), Reviewer (checklist scoring by severity), Inversion (agent-led interview before acting), and Pipeline (strict multi-step with gate conditions). Each pattern includes working ADK code, helping developers build reliable agents.

x.com · 13 min · Agents · AI · Framework

05-27

ByteDance TRAE AI Coding Manuals: Context Engineering as Moat

A distilled summary of ByteDance TRAE team's 20 internal AI coding practice manuals. The core argument is that the bottleneck in AI coding efficiency is not model capability but context engineering. The article details six methodologies: Context Engineering, Skills, Spec Coding, Rules, MCP, and Agentic Coding, backed by experimental data (e.g., 32 real bug fixes: 100% success with Skills vs 59% without). Suitable for frontline developers, tech leads, and engineering managers.

x.com · 14 min · Agents · AI · LLM

05-27

From Scratch: Build Automated Claude Code Workflows with Hooks

A tutorial on using Claude Code Hooks to automate shell commands at lifecycle events, replacing unreliable prompt instructions. Covers 5 key events (PostToolUse, PreToolUse, etc.), 3 hook types (command/prompt/agent), and config file structure. Provides 5 ready-to-use examples: desktop notification, auto-formatting, file protection, context recovery after compaction, and commit message linting. Exit code 2 blocks dangerous actions and feeds stderr back to Claude. For developers seeking reliable Claude Code workflows.

x.com · 10 min · Agents · AI

05-27

Claude Code in Large Codebases: Best Practices and Getting Started

This article covers how Claude Code navigates large codebases using agentic search instead of RAG indexing, avoiding stale index issues but requiring good context configuration. It details the 'harness' ecosystem around the model—CLAUDE.md, Hooks, Skills, Plugins, MCP servers, LSP integration, and subagents—and presents three configuration patterns from successful deployments: making the codebase navigable, maintaining CLAUDE.md as models evolve, and assigning ownership for rollout. A practical guide for teams adopting Claude Code at scale.

claude.com · 19 min · Agents · AI

05-27

Why Your “AI-First” Strategy Is Probably Wrong

The CTO of an agent platform shares their journey of rebuilding the entire engineering workflow around AI: 99% of production code is written by AI, shipping features within a day. The article critiques the superficial “AI-assisted” approach and introduces “harness engineering,” detailing their tech stack, self-healing feedback loop, and the new engineer roles of Architect and Operator. Real-world results include 3–8 deployments per day. Valuable for teams and CTOs seeking genuine AI integration.

x.com · 19 min · Agents · AI

05-27

Using Claude Code: The unreasonable effectiveness of HTML

Thariq Shihipar argues for using HTML instead of Markdown when working with Claude Code. HTML can represent tables, SVG, designs, and interactions—far denser information than Markdown. HTML docs are more readable, shareable, and can include interactive elements. Claude Code can pull context from codebases, Slack, git history to generate rich HTML reports, prototypes, and review interfaces. Concrete use cases cover planning, code review, design, reporting, and custom editing tools, with reusable prompt examples. For developers seeking to make Claude Code outputs more engaging and actionable.

claude.com · 12 min · AI · LLM

05-27

Claude Can Do All of This. Most People Have No Idea.

This guide covers 17 hidden Claude features: persistent memory via Projects, interactive app building with Artifacts, step-by-step reasoning in Adaptive Thinking, long-term user profiling with Memory, role-based prompts, a browser agent (Claude in Chrome), desktop file-system access (Cowork), scheduled tasks, installable skills, CLAUDE.md project rules, terminal coding with Claude Code, visual design with Claude Design, and 90% cost reduction through Prompt Caching. Each includes where to find it and a ready-to-use prompt.

x.com · 11 min · AI

05-27

How to Actually Use Claude. 18 steps that unlock 100% of its potential

This guide provides 18 actionable steps to fully leverage Claude. It covers setting up Projects and Custom Instructions for persistent context, shifting your mindset to treat Claude as a thinking partner rather than a search engine, and using advanced techniques like style cloning, Extended Thinking, and token-saving prompts. Ready-to-use templates are included for Feynman-style learning, travel planning, expense analysis, and business idea stress-testing. A key insight: simply specifying output length can cut token usage by 40-60%. Aimed at users who want to go beyond basic Q&A and make Claude work for them.

x.com · 10 min · AI · LLM