agent host: share real-SDK integration tests across Copilot and Claude (#316532)
* agent host: share real-SDK integration tests across Copilot and Claude Refactor the real Copilot SDK integration tests so the cross-provider portion is shared with a new Claude real-SDK suite, with per-provider capability flags for behaviors that differ. - Extract `defineSharedRealSdkTests` + helpers into `realSdkTestHelpers.ts`. - Replace `toolApprovalRealSdk.integrationTest.ts` with `copilotRealSdk.integrationTest.ts` (shared suite + Copilot-only tests: usage cost, cd-prefix strip, git-driven diffs). - Fold the standalone `sessionDiffsRealSdk` test into Copilot's suite. - Add `claudeRealSdk.integrationTest.ts` gated behind `AGENT_HOST_REAL_SDK=1 AGENT_HOST_REAL_SDK_CLAUDE=1`. SDK directory is resolved from the dev dependency at `node_modules/@anthropic-ai/claude-agent-sdk`. Auth requires an OAuth token with Copilot access (a vanilla `gh auth token` does not work); the second env var ensures the suite isn't auto-enabled. - Per-test server isolation: each test gets a fresh agent host so a broken test can't poison subsequent ones (notably Claude's mid-turn dispose path). Real bugs fixed along the way: - Session URIs are now UUIDs. Claude SDK rejects non-UUID session IDs. - Dev `product.ts` stub now carries `tokenEntitlementUrl` / `mcpRegistryDataUrl`, so the out-of-sources Claude path no longer hits `Failed to parse URL from undefined` in `CopilotApiService._mintToken`. - `createRealSession` defaults to `isolation: 'folder'` so the agent runs in the test's working dir instead of silently materializing into `<wd>.worktrees/...`. - macOS `/var` <-> `/private/var` mismatch in the diff test via `realpathSync(mkdtempSync(...))`. - The shell-permission test was Copilot-shaped (assumed a pending `toolCallReady`); Claude's `default` mode auto-approves safe `Bash` at the SDK layer. Test now waits for `toolCallComplete` so it works on both providers. - Tool names parameterized per provider (`bash`/`Bash`, `task`/`Task`, `exit_plan_mode`/`ExitPlanMode`). Add a deterministic unit test for the `skipPermission: true` flag on the shell-helper tools (`read_bash` / `write_bash` / `bash_shutdown` / `list_bash`) since the original model-driven real-SDK regression test for that flag was inherently flaky. * address review: lazily probe Claude SDK path, drop console.error The module-eval-time call to resolveClaudeSdkPath() emitted console.error when the SDK directory was missing, which can fail the test runner even with the suite disabled. Probe filesystem only when the suite is opted in via env vars; return undefined silently otherwise — the suite gate itself surfaces the missing dependency by skipping.
T
Tyler James Leonhardt committed
0d23db45a18835ec8fbcc209d920b64d5d034d33
Parent: 4eb7b6f
Committed by GitHub <noreply@github.com>
on 5/15/2026, 7:28:30 PM