SIGN IN SIGN UP

perf(mcp): answer-directly steering — ~35% cheaper, ~70% fewer tool calls (#224)

* perf(mcp): steer agents to answer directly instead of delegating to subagents

CodeGraph beats native grep/read on cost only when the agent queries it
directly. When the agent delegates to file-reading sub-agents, those
sub-agents read files regardless of the index, so CodeGraph becomes net
overhead on top of the reads. The install templates even told agents to
"spawn a subagent for explore-class questions" — the expensive path.

Changes:
- server-instructions + both install templates: add an "Answer directly —
  don't delegate exploration" directive; reposition codegraph_explore as the
  efficient one-call multi-symbol tool (was: "spawn a subagent for it").
- codegraph_explore: hard-cap output to its adaptive budget (it overran,
  ~30k vs a 28k cap) and tighten the medium tier (28k->13k).
- codegraph_node: return a member outline for container kinds instead of the
  full class body.

Rigorous N>=4-per-arm warm-block benchmark (median total_cost_usd):
  excalidraw (~600 files):  WITH $0.54 vs native $1.02  (-47%)
  vscode     (~10k files):  WITH $0.41 vs native $0.72  (-42%)
  ky         (~25 files):   WITH $0.46 vs native $0.44  (wash)
Answers were equal-or-better (correct, file:line-cited) with ~6x fewer tool
calls; the directive drove the direct path on 14/14 codegraph runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(readme): rebuild benchmark with real-world repos + cost/token/time/tool savings

Replace the "Claude Code (Python+Rust/Java)" rows — which benchmarked the
Claude Code CLI repo, not real codebases in those languages — with real
open-source projects per language: Django (Python), Tokio (Rust), OkHttp
(Java), Gin (Go), plus Alamofire (Swift) and the existing TypeScript repos
(VS Code, Excalidraw).

The table now reports all four savings the change targets — cost, tokens,
time, tool calls — as the median of 4 runs per arm (Claude Opus 4.7,
headless claude -p, with vs empty MCP config). Averages across the 7 repos:
35% cheaper, 59% fewer tokens, 49% faster, 70% fewer tool calls. Adds a
methodology note and raw WITH->WITHOUT medians.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
C
Colby Mchenry committed
f5bbc26c602ac56b9fc5b0a49d0ecaed163e30e6
Parent: a473557
Committed by GitHub <noreply@github.com> on 5/20/2026, 9:33:50 PM