Continually Improving Our Agent Harness
Cursor shares how it continuously improves its agent harness, covering context window evolution from static to dynamic fetching, a two-layer evaluation system (offline benchmarks and online A/B tests measuring code keep rate and user satisfaction), tool call error classification and repair pipeline (anomaly detection + automated log analysis with Cloud Agents), per-model customization of tool formats and prompts (e.g., patch vs. string replacement), and mid-chat model switching with specialized instructions. The post concludes with a vision of multi-agent architectures where the harness orchestrates specialized sub-agents.