标签 · Local LLM — Glean

06-30

Practical Guide to Setting Up a Local Coding Agent Stack with Open-Weight Models

This is a step-by-step tutorial for building a fully local coding agent using open-weight LLMs (primarily Qwen3.6 35B-A3B) served via Ollama and the Qwen-Code harness. The author covers model selection, speed/memory benchmarking with a custom script, a small agent capability evaluation (5 tasks), and a security audit checklist before running any harness. It then compares the same local model across three harnesses—Qwen-Code, Codex (open-source), and Claude Code—finding that Codex achieves the same task success rate with roughly half the token usage of Claude Code. The guide also explains SSH tunneling to run the model on a dedicated machine (e.g., DGX Spark) while using the harness on the main workstation. Targeted at engineers comfortable with the CLI who want a transparent, inspectable, and free alternative to proprietary coding agents.

magazine.sebastianraschka.com · 45 min · Coding Agent · Local LLM · Ollama

Local LLM

1pick · chronological

Practical Guide to Setting Up a Local Coding Agent Stack with Open-Weight Models