feat(resolution): gin middleware-chain synthesizer + Opus 4.8 benchmark refresh (#547)
* fix(agent-eval): detect idle by content-stability, not spinner absence Opus 4.8's extended-thinking TUI shows no spinner / interrupt hint / timer while it streams its final answer — those appear only during the thinking and tool-use phases. The old detector treated ~5s of not-busy + prompt-present as done, so it killed interactive runs mid-answer, silently truncating both arms of the tmux A/B (low tool counts; the final assistant message left as a mid-investigation preamble). Now a run is done only when the captured pane stops changing for ~8s; while streaming, the pane changes every poll so stability never accrues. BUSY_RE stays as the immediate busy-reset for the thinking/tool/live-timer phase. Content-stability is model-agnostic — it survives future spinner re-wordings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(readme): refresh VS Code benchmark on v0.9.7 + Opus 4.8 Re-ran the VS Code A/B (headless median-of-4) on the current build and model. Cost savings held at 26% ($0.66->$0.89), but token/time/tool-call savings narrowed (78->63%, 52->20%, 85->69%) because Opus 4.8's without-CodeGraph arm explores far more efficiently than 4.7's did (16 tool calls vs 55, no Explore-subagent fan-out); the WITH arm is unchanged at 5 calls / 0 reads. Recomputed the average row and noted that the VS Code row is now a different model/version epoch than the other six. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(resolution): synthesize gin middleware-chain edges (Next -> registered handlers) Gin runs its entire handler chain through one dynamic line in (*Context).Next -- c.handlers[c.index](c), a slice-index dispatch tree-sitter can't resolve. So callees(Next) dead-ended at the len() helper and the flow ServeHTTP -> handleHTTPRequest -> Next stopped at the exact symbol a 'how does the middleware chain work' question is about, sending the agent to re-query and Read/grep (a measured gin WITH-arm rabbit-hole: 2/4 headless runs spiraled to ~5min, one mis-firing the opt-in Workflow orchestration tool). Find the chain dispatcher (a Go method invoking a handlers slice by index) and link it -> every HandlerFunc registered via .Use/.GET/.../.Handle, so callees(Next) and trace(ServeHTTP, handler) connect end-to-end. Gated on the dispatcher existing (inert on non-gin Go repos), named handlers only (inline closures skipped), capped; provenance heuristic / synthesizedBy gin-middleware-chain, registeredAt = the registration site. Validated: gin callees(Next) now surfaces Logger/Recovery/ErrorLogger + handlers (node count stable at 2,544; 5 precise edges); agent A/B (headless median-of-4, Opus 4.8) flipped gin from -58% cost / -129% time to +7% cost / +35% tokens / +8% time / 38% tool calls, all 4 WITH runs clean (0 Read/Grep/Bash). 167/167 unit tests pass incl. the new gin-middleware-chain test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(readme): publish uniform Opus 4.8 benchmark + per-repo breakdown accordion Refresh all 7 benchmark rows to the v0.9.7 / Opus 4.8 headless median-of-4 (was a mix of the 4.8 VS Code row + six 4.7 rows). New average 18% cheaper / 51% fewer tokens / 16% faster / 57% fewer tool calls; headline + methodology note updated 4.7->4.8. The gap is smaller than the prior 4.7 numbers because Opus 4.8's native grep/read is more efficient (the without-arm no longer fans out into large Explore-subagent sweeps) -- not a codegraph regression; CodeGraph still cuts tool calls and tokens on all 7 repos, with cost marginal/negative only on django + okhttp. Adds a top-level 'Per-repo breakdown' accordion (per-metric Time/Reads/Grep-Bash/Tool calls/Tokens/Cost, WITH vs WITHOUT, per repo) directly below the condensed summary; methodology/queries/why-wins move to a second accordion. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(changelog): note Gin middleware-chain synthesizer under [Unreleased] Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
C
Colby Mchenry committed
f58de8a391259214729b1e8de3524e9589c5e056
Parent: 7a75c82
Committed by GitHub <noreply@github.com>
on 5/29/2026, 4:41:11 AM