SIGN IN SIGN UP

feat(eval): add agent-eval harness and /audit + /publish Claude skills

Replaces the old interactive publish.js script with two Claude skills and
a full agent-evaluation harness:

- `.claude/skills/audit/` — `/audit` skill drives `scripts/agent-eval/audit.sh`
  to benchmark retrieval quality (with vs. without codegraph) on a chosen
  real-world repo from the new `corpus.json` (17 repos across 14 languages).
- `.claude/skills/publish/` — `/publish` skill orchestrates the full release
  workflow (preflight → changelog → confirmation gate → bump/build → npm
  publish → GitHub release), replacing `publish.js`.
- `scripts/agent-eval/` — headless (`run-agent.sh`, `run-all.sh`) and
  interactive tmux (`itrun.sh`) harnesses with stream-json parsers
  (`parse-run.mjs`, `parse-session.mjs`) that report tool calls, token
  usage, and a VERDICT line summarising codegraph_explore vs. Read/Grep counts.
- `run-interactive-test.md` — documents the two harnesses, idle-detection
  approach, and what "good" agent behavior looks like after explore-first
  guidance.
C
Colby McHenry committed
7fe64b32be0a08b35d737e76dcbb79c79ddea408
Parent: 1cbca5a