perf(mcp): answer-directly steering — ~35% cheaper, ~70% fewer tool calls (#224)
* perf(mcp): steer agents to answer directly instead of delegating to subagents CodeGraph beats native grep/read on cost only when the agent queries it directly. When the agent delegates to file-reading sub-agents, those sub-agents read files regardless of the index, so CodeGraph becomes net overhead on top of the reads. The install templates even told agents to "spawn a subagent for explore-class questions" — the expensive path. Changes: - server-instructions + both install templates: add an "Answer directly — don't delegate exploration" directive; reposition codegraph_explore as the efficient one-call multi-symbol tool (was: "spawn a subagent for it"). - codegraph_explore: hard-cap output to its adaptive budget (it overran, ~30k vs a 28k cap) and tighten the medium tier (28k->13k). - codegraph_node: return a member outline for container kinds instead of the full class body. Rigorous N>=4-per-arm warm-block benchmark (median total_cost_usd): excalidraw (~600 files): WITH $0.54 vs native $1.02 (-47%) vscode (~10k files): WITH $0.41 vs native $0.72 (-42%) ky (~25 files): WITH $0.46 vs native $0.44 (wash) Answers were equal-or-better (correct, file:line-cited) with ~6x fewer tool calls; the directive drove the direct path on 14/14 codegraph runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(readme): rebuild benchmark with real-world repos + cost/token/time/tool savings Replace the "Claude Code (Python+Rust/Java)" rows — which benchmarked the Claude Code CLI repo, not real codebases in those languages — with real open-source projects per language: Django (Python), Tokio (Rust), OkHttp (Java), Gin (Go), plus Alamofire (Swift) and the existing TypeScript repos (VS Code, Excalidraw). The table now reports all four savings the change targets — cost, tokens, time, tool calls — as the median of 4 runs per arm (Claude Opus 4.7, headless claude -p, with vs empty MCP config). Averages across the 7 repos: 35% cheaper, 59% fewer tokens, 49% faster, 70% fewer tool calls. Adds a methodology note and raw WITH->WITHOUT medians. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
C
Colby Mchenry committed
f5bbc26c602ac56b9fc5b0a49d0ecaed163e30e6
Parent: a473557
Committed by GitHub <noreply@github.com>
on 5/20/2026, 9:33:50 PM