feat(eval): add agent-eval harness and /audit + /publish Claude skills
Replaces the old interactive publish.js script with two Claude skills and a full agent-evaluation harness: - `.claude/skills/audit/` — `/audit` skill drives `scripts/agent-eval/audit.sh` to benchmark retrieval quality (with vs. without codegraph) on a chosen real-world repo from the new `corpus.json` (17 repos across 14 languages). - `.claude/skills/publish/` — `/publish` skill orchestrates the full release workflow (preflight → changelog → confirmation gate → bump/build → npm publish → GitHub release), replacing `publish.js`. - `scripts/agent-eval/` — headless (`run-agent.sh`, `run-all.sh`) and interactive tmux (`itrun.sh`) harnesses with stream-json parsers (`parse-run.mjs`, `parse-session.mjs`) that report tool calls, token usage, and a VERDICT line summarising codegraph_explore vs. Read/Grep counts. - `run-interactive-test.md` — documents the two harnesses, idle-detection approach, and what "good" agent behavior looks like after explore-first guidance.
C
Colby McHenry committed
7fe64b32be0a08b35d737e76dcbb79c79ddea408
Parent: 1cbca5a