SIGN IN SIGN UP

agent host: share real-SDK integration tests across Copilot and Claude (#316532)

* agent host: share real-SDK integration tests across Copilot and Claude

Refactor the real Copilot SDK integration tests so the cross-provider
portion is shared with a new Claude real-SDK suite, with per-provider
capability flags for behaviors that differ.

- Extract `defineSharedRealSdkTests` + helpers into `realSdkTestHelpers.ts`.
- Replace `toolApprovalRealSdk.integrationTest.ts` with
  `copilotRealSdk.integrationTest.ts` (shared suite + Copilot-only
  tests: usage cost, cd-prefix strip, git-driven diffs).
- Fold the standalone `sessionDiffsRealSdk` test into Copilot's suite.
- Add `claudeRealSdk.integrationTest.ts` gated behind
  `AGENT_HOST_REAL_SDK=1 AGENT_HOST_REAL_SDK_CLAUDE=1`. SDK directory
  is resolved from the dev dependency at
  `node_modules/@anthropic-ai/claude-agent-sdk`. Auth requires an
  OAuth token with Copilot access (a vanilla `gh auth token` does not
  work); the second env var ensures the suite isn't auto-enabled.
- Per-test server isolation: each test gets a fresh agent host so a
  broken test can't poison subsequent ones (notably Claude's mid-turn
  dispose path).

Real bugs fixed along the way:

- Session URIs are now UUIDs. Claude SDK rejects non-UUID session IDs.
- Dev `product.ts` stub now carries `tokenEntitlementUrl` /
  `mcpRegistryDataUrl`, so the out-of-sources Claude path no longer
  hits `Failed to parse URL from undefined` in
  `CopilotApiService._mintToken`.
- `createRealSession` defaults to `isolation: 'folder'` so the
  agent runs in the test's working dir instead of silently
  materializing into `<wd>.worktrees/...`.
- macOS `/var` <-> `/private/var` mismatch in the diff test via
  `realpathSync(mkdtempSync(...))`.
- The shell-permission test was Copilot-shaped (assumed a pending
  `toolCallReady`); Claude's `default` mode auto-approves safe
  `Bash` at the SDK layer. Test now waits for `toolCallComplete` so
  it works on both providers.
- Tool names parameterized per provider (`bash`/`Bash`,
  `task`/`Task`, `exit_plan_mode`/`ExitPlanMode`).

Add a deterministic unit test for the `skipPermission: true` flag on
the shell-helper tools (`read_bash` / `write_bash` /
`bash_shutdown` / `list_bash`) since the original model-driven
real-SDK regression test for that flag was inherently flaky.

* address review: lazily probe Claude SDK path, drop console.error

The module-eval-time call to resolveClaudeSdkPath() emitted
console.error when the SDK directory was missing, which can fail the
test runner even with the suite disabled. Probe filesystem only when
the suite is opted in via env vars; return undefined silently
otherwise — the suite gate itself surfaces the missing dependency by
skipping.
T
Tyler James Leonhardt committed
0d23db45a18835ec8fbcc209d920b64d5d034d33
Parent: 4eb7b6f
Committed by GitHub <noreply@github.com> on 5/15/2026, 7:28:30 PM