SIGN IN SIGN UP

Audit remediation: security, observability, resilience (Phases 1-3) (#499)

* fix(security,correctness): Phase 1 audit remediation

Secrets-at-rest:
- refresh-lease: write token files mode 0600 under a 0700 dir (auth-01)
- oc-chatgpt: atomic 0600 write of merged secret-bearing account file
  (chatgpt-import-01/02)
- refresh-queue: log non-reversible sha256 token fingerprints, not the
  recoverable trailing chars (auth-05)
- debug-bundle: mask account email + redact home-dir paths (cli-manager-06,
  errors-logging-04)
- runtime proxy: mask status.lastError at the exposure boundary
  (errors-logging-08)

Loopback / egress invariants:
- runtime proxy refuses non-loopback bind unless explicitly opted in
  (runtime-proxy-01)
- local-bridge validates runtimeBaseUrl resolves to loopback (runtime-proxy-02)

Default-on correctness:
- host-codex-prompt: restore live sst/opencode URLs; the rebrand-artifact
  URLs 404'd and re-fetched every request (prompts-01)
- context-overflow: emit Responses-API SSE so the client can parse the
  /compact notice (was Anthropic Messages SSE); random synthetic id
  (recovery-01, recovery-11)
- budget eval: include archived ledger rows so rotation can't bypass a
  budget within the active window (quota-forecast-03)
- storage primary read: retry transient FS locks instead of dropping into
  WAL/backup recovery (storage-01)
- config explain: add 3 missing live keys + parity guard test
  (config-01, config-07)

Published type contract:
- bundle @codex-ai/sdk (deps + bundleDependencies) so a consumer's tsc can
  resolve the types re-exported by the published .d.ts (docs-supplychain-01)

Test integrity:
- wrapper routing: add 3 missing auth subcommands; test imports the real
  ACCOUNT_MANAGER_COMMANDS instead of a hardcoded list (cli-manager-01/02)
- property tests: wire setup.ts via setupFiles so fc.configureGlobal applies
  (tests-ci-02)
- vitest: serialize fixed-port tests via fileParallelism:false (tests-ci-03)
- oauth integration: await real port release in afterEach (tests-ci-03)

All changes verified: tsc clean, lint clean, 4073 tests pass, pack-check ok.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(sandbox): redirect HOME/CODEX_HOME to a per-run temp dir (tests-ci-01)

Suites that resolve storage/config paths but forget to redirect HOME /
USERPROFILE / CODEX_HOME / CODEX_MULTI_AUTH_DIR could fall through to the
developer's real ~/.codex and read or clobber live account state. A
setupFiles sandbox now pins all four to a per-worker temp dir before any
test imports application code, so the unsandboxed default is an empty
throwaway dir. Tests that set these vars themselves still override and
restore to the sandbox value, never the real home.

Adds a guard test asserting resolved codex home + multi-auth dir land
under the sandbox root.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(observability,resilience): Phase 2 audit remediation

Observability cluster:
- logger: replace the process-global correlation id with AsyncLocalStorage so
  concurrent runtime-proxy requests cannot read each other's id; keep the
  set/get/clear API backward-compatible and add runWithCorrelationId for
  scoped use (errors-logging-02)
- runtime proxy: inject a structured logger and bind a per-request trace id
  (distinct from sessionKey) around each request; log request failures through
  it with redaction instead of only stashing status.lastError
  (errors-logging-01/03, runtime-proxy-04)

Resilience:
- resolvePath: canonicalize the deepest existing ancestor via realpath and
  re-check containment, so a symlink inside an approved root that resolves
  outside it is rejected (storage-02 symlink TOCTOU)
- removeAccount: clear identity-keyed health/token tracker state and the
  account's circuit breaker so a re-added identity does not inherit stale
  penalties (accounts-02); adds clearAccountKey to both trackers and
  removeCircuitBreaker
- runtime proxy chooseAccount: thread pidOffsetEnabled into the hybrid
  selector so the default-on proxy spreads load like the plugin-host path
  instead of stampeding one account (accounts-05)

Tests: AsyncLocalStorage isolation, symlink-escape accept/reject, and
remove-then-fresh-health regression tests added.

Verified: tsc clean, npm test 3x consecutive green (270 files / 4080 tests).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(prompts): harden GitHub prompt fetch path (Phase 2c)

Add lib/prompts/fetch-utils.ts and route both prompt fetchers
(codex.ts, host-codex-prompt.ts) through it:
- fetchWithTimeout: bounded AbortSignal timeout so a hung GitHub
  connection cannot stall the request pipeline (prompts-02)
- readBodyTextGuarded: 1 MB size cap (Content-Length + streamed
  enforcement) and rejection of empty/whitespace-only bodies so a bad
  200 is not cached and served as instructions (prompts-04/05)
- withPromptFetchHeaders: always send User-Agent + Accept, since
  api.github.com rejects requests without a UA (prompts-08)

Also revert the earlier fileParallelism:false experiment in
vitest.config.ts. Investigation (8 runs each on this branch vs pristine
origin/main) showed the suite has a pre-existing, environment-level
intermittent vitest worker crash on Windows (exit 1, no test failure, no
summary) that reproduces on upstream main at the same rate; it is
unrelated to fileParallelism, which added no protection beyond the npm
--maxWorkers=1 flag. Tracked as finding tests-ci-16.

Tests: 9 fetch-utils unit tests (timeout/size/empty/headers); host-prompt
header assertions updated. Prompt suites: 58 tests green; tsc + lint clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(ui): honor NO_COLOR / FORCE_COLOR / TTY for color output (ui-04)

createUiTheme now blanks all ANSI color tokens when color should be
disabled, via shouldDisableColor():
- NO_COLOR set (any value) disables color (no-color.org)
- FORCE_COLOR=0/false forces off; any other FORCE_COLOR value forces on
- otherwise color is off when stdout is not a TTY (piped/redirected)
Glyphs are unaffected. Callers may pass disableColor explicitly.

Theme/format tests that assert ANSI codes now opt into color explicitly
(disableColor:false) since the test env sets NO_COLOR/FORCE_COLOR=0; adds
dedicated gating tests for shouldDisableColor and the blanked theme.

Verified: tsc + lint clean; full suite 271 files / 4095 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(runtime): wire the routing mutex into the proxy (accounts-01/08)

The routingMutex config flag + *Locked methods + withRoutingMutex were
fully implemented and tested but had zero production callers, so the flag
implied a concurrency guarantee the synchronous selector never provided.

Wire it on the path that actually needs it:
- apply getRoutingMutexMode(pluginConfig) to the account manager at proxy
  startup and on every manager-reload path
- route persistRuntimeActiveAccount's commit through markSwitchedLocked,
  since that path spans an await (syncCodexCliActiveSelectionForIndex) and
  is the real lost-update window the mutex targets

The synchronous chooseAccount+markSwitched sites are left as-is: with no
await between read and write they are already event-loop-atomic, so a
mutex there would add overhead without closing a race. Legacy mode (the
default) runs inline, so behavior is unchanged unless a user sets
CODEX_AUTH_ROUTING_MUTEX=enabled.

Tests: assert the mode propagates for enabled + legacy-default at startup.
Verified: tsc + lint clean; full suite 271 files / 4097 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs: fix version/override drift + guard it (docs-supplychain-03/04)

- AGENTS.md: bump the stale header version 2.0.1 -> 2.1.13-beta.2
- SECURITY.md: correct the hono override rationale (4.12.14 -> 4.12.18)
  to match the actual package.json pin
- documentation.test.ts: add a parity test asserting SECURITY.md cites
  the same hono override version as package.json, so this drift cannot
  silently recur

Verified: documentation suite 25 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(recovery): quarantine + surface corrupt session files (recovery-10)

readMessages/readParts previously skipped unparseable session files with
a bare `continue`, so corrupt recovery data was dropped with no signal
(also recovery-04). Now:
- quarantineCorruptFile() renames the bad file to a `.corrupt-<ts>`
  sibling (preserved for inspection, not deleted), with a graceful
  fallback if the rename is blocked (e.g. Windows lock)
- a process-level corruption count + quarantined-path list is exposed via
  getRecoveryCorruptionStats() so callers can report it

Regression test asserts the rename + stats for a corrupt message file.
Verified: tsc + lint clean; full suite 271 files / 4098 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(config): retry transient FS locks on the config load path (config-04)

loadPluginConfig() is synchronous, so a transient EBUSY/EPERM/EAGAIN on
the legacy-file readFileSync fell straight through to the catch and
silently reverted to DEFAULT_PLUGIN_CONFIG, discarding the user's real
settings. Add readFileSyncWithConfigRetry (sync, Atomics.wait backoff,
retries only the transient lock codes; ENOENT and SyntaxError propagate
unchanged) mirroring the async retry already used by the unified path.

Tests: transient EBUSY is retried and the setting survives; a
non-transient code falls back to defaults without retrying.
Verified: tsc + lint clean; full suite 271 files / 4100 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(storage): unify fs-retry code sets (storage-07)

Three bespoke retry loops used divergent retryable-code subsets that all
omitted ENOTEMPTY (and EACCES):
- account-clear.ts isRetryableFsError (EBUSY/EPERM only)
- flagged-storage-io.ts RETRYABLE_UNLINK_CODES (EBUSY/EAGAIN/EPERM)
- import-export.ts renameExportFileWithRetry (EPERM/EBUSY/EAGAIN)

Route all three through the shared lib/fs-retry.ts set
(shouldRetryFileOperation / FILE_RETRY_CODES). Each loop keeps its own
attempt count and backoff; only the retryable-code decision is unified,
so the existing count-specific tests are unaffected.

Tests: account-clear parametrized retry test extended to ENOTEMPTY/EACCES.
Verified: tsc + lint clean; full suite 271 files / 4102 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(prompts): SHA-256 cache integrity + atomic cache writes (prompts-03/06)

codex.ts cached the GitHub prompt with a non-atomic parallel writeFile of
content + meta (torn-file risk on crash) and trusted the disk cache
blindly (cache-poisoning -> persistent prompt-injection vector).

- writeCacheAtomically: temp + rename for both files, content written
  before meta so the meta's sha always describes an on-disk content file
  (prompts-06)
- CacheMetadata gains an optional sha256; the disk cache is verified
  against it on read and discarded + refetched on mismatch (prompts-03).
  Backward-compatible: caches without a sha are still accepted.

Tests: sha-match serves the cache without a fetch; sha-mismatch discards
and refetches. Verified: tsc + lint clean; full suite 271 files / 4104
tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(recovery): guard sort, validate mutate paths, honest strip result

- recovery-02: readMessages sort comparator dereferenced a.id/b.id outside
  the per-file try/catch, so a parseable record missing `id` threw out of
  the sort and crashed the read. Guard the id access.
- recovery-03: injectTextPart / prependThinkingPart / stripThinkingParts /
  replaceEmptyTextParts joined messageID into a filesystem path with no
  validatePathId (only the read path validated). Add the same guard so a
  crafted messageID cannot escape PART_STORAGE.
- recovery-05: stripThinkingParts reported success when ANY part was
  removed even if a TARGETED thinking part failed to delete, so auto-resume
  treated the message as clean and retried forever (burning quota). Now it
  returns true only when every targeted thinking part was actually removed.

Tests: missing-id no-throw; unsafe-messageID rejection on all four mutate
helpers; strip returns false on a failed targeted delete.
Verified: tsc + lint clean; full suite 271 files / 4107 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(quota): align capability key + add quota-window staleness escape

- quota-forecast-01: evaluateRuntimePolicy read the capability store under
  getAccountPolicyKey, but the recordUnsupported sites WRITE under
  resolveEntitlementAccountKey. The keys never matched, so unsupported-model
  suppression was dead. Read under the same entitlement key. Regression test
  records an unsupported model and asserts the account is blocked.
- quota-forecast-02: a quota window at 100% used with no resetAtMs read as
  exhausted forever. When updatedAt + windowMinutes have elapsed, treat the
  window as rolled over (conservative synthesized expiry). Regression tests
  cover before/after the window boundary.

Not changed (documented): quota-forecast-04 (sharing resolveNormalizedModel
between entitlement-cache and the capability matrix is unsafe — the canonical
normalizer collapses aliases like gpt-5-codex -> gpt-5.3-codex for routing,
but entitlement identity must keep them distinct; the existing tests confirm
this, so the two normalizations are intentionally separate). quota-forecast-05
(persisting in-memory capability/scheduler state) is a larger change left for
a follow-up.

Verified: tsc + lint clean; full suite 271 files / 4110 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(ui): color-bleed, display-width alignment, glyph-mode bar (ui-01/02/03)

- ui-01: truncateAnsi appends a reset when the kept text contains an ANSI
  escape, so a color opened before the cut cannot bleed past the
  truncation point into the rest of the line.
- ui-02: add a dependency-free display-width helper (CJK/fullwidth/hangul/
  emoji = 2 cols, combining marks/ZWJ = 0) and pad+truncate table cells by
  display columns instead of UTF-16 code units, so CJK/emoji content stays
  aligned and wide glyphs are never split across the boundary.
- ui-03: the quota bar honors glyphMode (Unicode block glyphs only in
  unicode mode, ASCII #/- otherwise) instead of hardcoding block glyphs
  that render as mojibake on ascii terminals.

Tests: 8 display-width unit tests + CJK table-alignment/truncation
regressions. Verified: tsc + lint clean; full suite 272 files / 4120 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(settings): single-source the refresh-interval bounds (settings-hub-01)

The experimental settings panel hardcoded proactiveRefreshIntervalMs
bounds (min 60000, step 60000) that contradicted the backend settings
schema (min 5000, step 5000) for the same setting, so the two panels
clamped/stepped the value differently. Derive the panel's min/max/step
from the single BACKEND_NUMBER_OPTION_BY_KEY schema entry so they cannot
diverge.

Updated the tests that encoded the old divergent 60000 step (3 in
settings-hub-utils, 2 in codex-manager-cli that drive the same hotkeys
through the full login flow).

Note: settings-hub-02 was investigated and is NOT a bug against current
code — savePluginConfig already reads-merges under withConfigSaveLock and
buildBackendConfigPatch emits only scoped backend keys, so there is no
full-snapshot lost-update window.

Verified: tsc + lint clean; full suite 272 files / 4120 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(request): log deprecation/sunset headers on the error path too (request-01)

RFC 8594 Deprecation/Sunset headers were logged only by
handleSuccessResponse. Upstream often attaches them to error responses
(e.g. a sunset endpoint returning 4xx/410), so they were silently dropped
there. Extract a shared logDeprecationHeaders helper and call it from both
handleSuccessResponse and handleErrorResponse.

Test: an error response carrying a Sunset header logs the deprecation warning.
Verified: tsc + lint clean; full suite 272 files / 4121 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(ci): deterministic property seed + coverage gate on PRs

- tests-ci-06: pin a deterministic fast-check seed (override via
  FAST_CHECK_SEED) so property-test failures are reproducible from CI logs
  instead of using a fresh random seed each run.
- tests-ci-05: PR CI runs `npm run coverage` instead of `npm test`, so the
  80% coverage threshold gates PRs and not only the post-merge push-to-main
  run in ci.yml.

Guard tests assert the pinned seed and the PR coverage step.
Verified: tsc + lint clean; full suite 272 files / 4123 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(cli): add --json to status/list (cli-manager-03)

status and list are the primary inspection commands but only emitted
human text. Add a --json/-j branch that prints a single machine-readable
object (storage path/health, account count, active/pinned/recommended
indices, runtime-in-use index, and a per-account array with markers) built
from the same data the text path renders. The empty-storage path emits a
minimal JSON object too.

Tests: populated + empty storage emit exactly one JSON object with the
expected shape. Verified: tsc + lint clean; full suite 272/4125 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(chatgpt-import): guard planOcChatgptSync against load/preview throws (chatgpt-import-06)

planOcChatgptSync let loadTargetStorage (JSON-parse of an external file)
and previewMerge throw uncaught, while the sibling applyOcChatgptSync
wraps everything and returns a structured result. A corrupt/unreadable
target file therefore crashed planning instead of surfacing a friendly
status.

Add a "plan-error" result variant carrying the error + cause (load|preview);
applyOcChatgptSync maps it onto its existing error variant, and the
experimental settings panel surfaces the real failure message instead of a
generic "unavailable".

Tests: plan-error on loader throw (cause: load) and on preview throw
(cause: preview). Verified: tsc + lint clean; full suite 272/4127 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(config): make load precedence symmetric with save (config-02)

savePluginConfig writes to CODEX_MULTI_AUTH_CONFIG_PATH first when that env
var is set, but loadPluginConfig preferred unified settings and only fell
to the env/legacy path when unified was absent. With both an env path and a
unified file present, a save went to the env path while the next load read
unified — a split-brain where the saved value was invisible.

loadPluginConfig now reads the env path first when it is set and exists,
mirroring the save precedence (env > unified > legacy).

Test: with the env path set, load returns the env file's value (not unified).
Verified: tsc + lint clean; full suite 272/4128 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(local-bridge): allow auth to an auth-enabled runtime proxy (runtime-proxy-03)

The local bridge had no way to present a client token, so it could not talk
to a runtime proxy that requires a per-process client key. Add a
runtimeClientApiKey option; when set, the bridge replaces the inbound (already
bridge-validated) Authorization with that key on the forwarded request. When
unset, inbound Authorization is stripped rather than forwarded verbatim, so the
caller's bridge token is never leaked upstream (reinforces runtime-proxy-02).

Tests: forwarded request carries the configured runtime key; inbound auth is
stripped when no key is set. Verified: tsc + lint clean; full suite 272/4130.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(scripts): wire the preuninstall lifecycle hook (install-scripts-02)

scripts/preuninstall.js was shipped in files[] and unit-tested but had no
"preuninstall" entry in package.json scripts, so it never ran on
npm uninstall — dead cleanup. Wire it as the lifecycle hook next to
postinstall. The script is already self-guarding (its top-level catch forces
exitCode 0), so it cannot fail an uninstall.

Test: package.json registers preuninstall -> scripts/preuninstall.js and
ships the script. Verified: tsc + lint clean; full suite 272/4131 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(scripts): detect unlisted vendor files in provenance check (install-scripts-01)

verify-vendor-provenance only hashed the files listed in the manifest, so a
rogue/extra file added to a vendored dir passed silently — provenance proved
the listed files were intact but never that the directory contained nothing
else. After verifying listed files, recursively enumerate each component's
root and fail if any on-disk file is not declared in vendor/provenance.json.

Verified empirically: a planted vendor/codex-ai-sdk/dist/rogue.js makes the
verifier exit 1 with an "Unlisted vendor file(s)" error; removing it restores
the clean "2 component(s), 8 file(s) verified" pass. Lint clean; full suite
272 files / 4131 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(review): address greptile + coderabbit findings on audit PR #499

Resolve all 34 review threads plus the greptile P2 from the audit-remediation
PR, split into source fixes and the regression tests reviewers asked for. Full
suite: 4104 -> 4165 tests, tsc + tsc:scripts + eslint clean.

Source fixes:
- logger: clearCorrelationId stores null (not "") inside an ALS scope so
  getCorrelationId honors its string|null contract
- oc-chatgpt-orchestrator: create the merged-token parent dir 0o700 (matches
  the 0o600 file) so it is not world-listable
- context-overflow: add random entropy to responseId (was Date.now() only)
- prompts/fetch-utils: mandatory User-Agent/Accept now win over caller headers;
  drop leftover // PLACEHOLDER_READ_BODY artifact
- prompts/codex: route writeCacheAtomically through withFileOperationRetry
  (windows EBUSY/EPERM/ENOTEMPTY) and document the two-rename window
- runtime-rotation-proxy: maskString the logged error.message (matches lastError)
- recovery/storage: only quarantine on real corruption; skip transient
  EBUSY/EPERM/EACCES/EAGAIN/ENOENT read races; quarantine rename now retries
- debug-bundle: path-aware home redaction (case-fold on win32, require a path
  boundary so /users/alice2 != /users/alice)
- local-bridge: reject runtimeClientApiKey when requireAuth is false
- config: explain report mirrors loadPluginConfig env-path precedence; sync
  read retry aligned to 5 attempts
- verify-vendor-provenance: fail closed on symlinks / unsupported dirent types
- codex-manager: move ACCOUNT_MANAGER_COMMANDS to a shared internal module

Regression tests added for every behavioral change, plus the coverage-only asks:
token-bucket+breaker reset on removeAccount, storage-parser EBUSY retry,
unpin/workspace/uninstall routing, non-loopback (0.0.0.0) proxy opt-in, oauth
port fail-loud teardown, oc-orchestrator removeWithRetry cleanup, SECURITY.md
rollup pin parity, status/list -j/--json CLI plumbing, quota secondary-window +
boundary + invalid-entry cases, ui truncateAnsi reset placement, display-width
explicit zero-width escapes, bidirectional config-explain parity, and dropped
redundant vitest global imports.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(review): second-pass review findings + deferred items (PR #499)

Address the CodeRabbit re-review threads on 6e04749 plus the two deferred
roadmap items. Full suite 4165 -> 4169 tests, tsc + eslint clean.

Re-review threads:
- runtime-rotation-proxy: add a regression that injects a token+email-bearing
  error and asserts getStatus().lastError is masked (errors-logging-08 had no
  redaction test guarding the catch path)
- prompt-fetch-utils: the abort-vs-resolve race test used a 5ms real wait
  against a 1000ms timeout, so a leaked timer would stay green; switch to fake
  timers and advance past the full timeout
- storage/paths (storage-02): add symlink-escape cases asserting the specific
  'resolves (via symlink) outside' message, a parent-prefix (non-leaf) symlink
  escape, and a case-only realpath that must not be a false escape

Deferred roadmap items:
- request-10: enable no-console for lib internals so stray output that should go
  through the masking logger is caught; allowlist the genuine CLI/UI output
  surface (index.ts, lib/cli.ts, lib/codex-manager/**, lib/auth/device-auth.ts)
  and mark logger.logToConsole (the single sanctioned, mask-first sink) with a
  scoped eslint-disable
- tests-ci-16: switch vitest to the forks pool with singleFork to eliminate the
  intermittent Windows worker_threads crash (kept single-worker semantics for
  the fixed-port OAuth suite); full suite still ~113s

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(review): deterministic paths test + Vitest 4 pool config (PR #499)

- test/paths.test.ts: drop the case-only realpath case — it used try/catch that
  passed whether resolvePath threw or not (case-folding is host-dependent:
  normalizePathForComparison only lower-cases on win32), so it guarded nothing.
  The escape-message, parent-prefix-escape, and allows-same-root cases already
  cover storage-02 deterministically.
- vitest.config.ts: the round-2 `poolOptions.forks.singleFork` shape was removed
  in Vitest 4 (the repo runs 4.0.18) and emitted a deprecation warning while
  silently no-opping. Vitest 4 already defaults to the forks pool; pin it
  explicitly and use the top-level `fileParallelism: false` (the v4 replacement
  for singleFork) to keep single-worker semantics for the fixed-port suites.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(security): close 5 audit findings from adversarial review (round 4)

Independent multi-agent review surfaced 5 real bugs the bot reviewers
missed, two of them security issues. All fixed with regression tests.

HIGH — prompts/codex.ts: 304 revalidation integrity bypass
  A 304 on the conditional refetch re-read disk and unconditionally
  re-blessed its sha256, so a tampered cache body could be laundered back
  into use. Gate the 304 re-serve on `cachedETag` (we only trust a 304 we
  asked for) and require the on-disk bytes to still match the prior sha
  before serving; otherwise throw so the caller falls back to bundled
  instructions. Also track `usableDiskContent` separately so a body that
  fails integrity is never served, used as the 304 body, or used as the
  offline catch fallback.

HIGH — storage/paths.ts: false symlink-escape on a symlinked root
  The symlink guard compared the resolved path only against raw roots, so
  a file legitimately under a root that is itself a symlink was rejected.
  Canonicalize the home/projectRoot/tmp roots and only throw when the path
  escapes both the raw AND canonical roots.

MEDIUM — prompts/fetch-utils.ts: unbounded body read
  readBodyTextGuarded had no idle timeout, so a server that sent headers
  then stalled mid-body hung the request path forever (the fetch-level
  AbortSignal only covers connect+headers). Race each reader.read() against
  a per-read idle timeout; on timeout cancel the reader and reject.

MEDIUM — local-bridge.ts: IPv6 loopback rejected
  isLoopbackHost did not treat "::1"/"[::1]" as loopback, so a valid IPv6
  runtime-proxy URL was refused at startup. Match the bracketed and bare
  forms, mirroring the runtime-rotation-proxy sibling.

LOW — refresh-lease.ts: 0o700 mode no-op on an existing lease dir
  mkdir({mode}) does not tighten an already-existing dir; chmod the lease
  dir to 0o700 after mkdir on non-win32 so a pre-existing loose-perm dir is
  hardened.

Also from the same review (cli-ui-logger-scripts batch):
  - logger.ts: sensitive `email` KEY was masked with maskToken (leaked the
    local part + TLD); use maskEmail. logToConsole now strips CR/LF like
    logToApp to prevent log-line injection.
  - debug-bundle.ts: the --json bundle embedded the raw config report
    (configPath leaked the OS username; entry values could carry a proxy
    URL with user:pass creds) and the raw activeAccountId. Redact
    configPath, route entry values through sanitizeValue, and mask
    activeAccountId.

typecheck + lint clean; full suite 4178 passed / 1 skipped.

* fix(recovery,prompts): address round-4 CodeRabbit findings (round 5)

MAJOR — recovery/storage.ts: quarantine parseable-but-invalid records
  readMessages/readParts pushed any JSON that parsed, so a record missing a
  string `id` (messages) or string `id`/`type` (parts) survived into the
  result and could crash a later id-based sort/index (e.g.
  findMessagesWithOrphanThinking's `a.id.localeCompare(b.id)`). Validate the
  minimal shape each reader relies on and route failures through
  handleUnreadableFile so malformed records are quarantined like corruption
  instead of surviving until a downstream crash.

MAJOR — prompts/codex.ts: guard getLatestReleaseTag body reads
  The fetch AbortSignal only covers connect+headers, so `response.json()` /
  `htmlResponse.text()` in getLatestReleaseTag could hang on a mid-body stall.
  Add `withBodyTimeout` (fetch-utils) and wrap both reads. It adds only the
  hang guard (no size/empty checks) so the small release-metadata reads are
  bounded without changing their semantics.

MINOR — prompts/codex.ts: retry temp-file cleanup on Windows
  writeCacheAtomically's finally cleanup did a bare `fs.rm(...).catch()`,
  bypassing withFileOperationRetry, so a transient EBUSY/EPERM/ENOTEMPTY/
  EACCES from AV/the indexer could leak the *.tmp sibling. Route cleanup
  through withFileOperationRetry too.

NIT — codex-manager/commands/status.ts: de-dupe marker builder + json shape
  The json and text paths rebuilt the identical marker sequence (current /
  disabled / rate-limited / 429-from-quota / quota-exhausted / cooldown),
  which would silently diverge if a marker were added to one branch only.
  Extracted a single buildAccountMarkers() used by both. Also emit the same
  keys (activeIndex/pinnedAccountIndex/recommendedIndex/recommendationReason/
  runtimeInUseIndex, as null) in the empty-storage --json branch so a --json
  consumer sees one stable shape.

Regression tests: invalid-record quarantine (messages + parts), stalled
release-body timeout, temp-file rm EBUSY retry, withBodyTimeout unit tests.
Updated the prior "missing id survives" recovery test to assert the record is
now quarantined (the behavior CodeRabbit asked for).

typecheck + lint clean; full suite 4184 passed / 1 skipped.

* test(rotation,status): add coverage requested in round-4 review (round 6)

Targeted unit coverage for the CodeRabbit test-gap findings:

- rotation.ts: direct HealthScoreTracker.clearAccountKey and
  TokenBucketTracker.clearAccountKey tests — clears every quota-key variant
  for one identity while leaving other identities untouched, plus the
  numeric→string key normalization. Previously only covered indirectly via
  the removeAccount regression in accounts.test.ts.

- codex-manager status/list --json: added the `auth list` / `auth status`
  wrapper form (-j/--json) to the plumbing matrix; the bare status/list form
  was covered but the auth-prefixed wrapper route (codex-manager.ts:3547) was
  not. Also assert the empty-storage --json branch now emits the stable shape
  (activeIndex/pinnedAccountIndex/recommendedIndex/recommendationReason/
  runtimeInUseIndex as null).

typecheck + lint clean; full suite 4192 passed / 1 skipped.

* fix: harden the deferred audit items (round 7)

Closes the residual items flagged but previously deferred as out-of-PR-scope.

prompts/fetch-utils.ts — withBodyTimeout now actually cancels the read
  Previously it only rejected on timeout while the stalled body kept consuming
  the connection until GC/socket close. It now takes the Response and calls
  response.body.cancel() on timeout so the stream is torn down. For mocks /
  fetch impls without a streamable body the cancel is a no-op and the timeout
  still rejects. getLatestReleaseTag updated to pass the response.

ui/display-width.ts — grapheme-cluster aware width
  codePointWidth summed per code point, so ZWJ emoji sequences (👨‍👩‍👧) were
  overcounted (6 instead of 2) and combining marks outside U+0300–036F (Arabic,
  Hebrew, Thai, Cyrillic, Syriac, etc.) counted as width 1. Added those
  zero-width ranges and a cluster walker that collapses ZWJ-joined emoji,
  emoji+skin-tone modifiers, and regional-indicator flag pairs to a single
  2-wide glyph. ZWJ between non-emoji is still treated as a zero-width control
  (so the existing a‍b → width 2 contract holds). truncateToWidth never splits
  a cluster.

ui/select.ts — truncateAnsi measures display columns
  It budgeted by UTF-16 code units (.length), so a CJK/emoji menu label could
  overflow `columns` and wrap to an extra physical row, desyncing the render's
  up-cursor line accounting and progressively corrupting the redraw. It now
  measures with displayWidth and advances per code point.

logger.ts — drop the event-loop-blocking Atomics.wait sleep
  ensureLogDir slept via Atomics.wait on EBUSY/EPERM, freezing ALL in-flight
  requests on the concurrent proxy for up to ~30ms. Replaced with immediate
  (non-blocking) retries and a cached "dir ready" flag so the steady state does
  a single existsSync and a transient lock no longer stalls the event loop.

oc-chatgpt-orchestrator.ts — rename retry + POSIX perm re-assert
  persistMergedDefault's atomic rename now routes through withFileOperationRetry
  (the destination is a live, watched store that can surface a transient
  EBUSY/EPERM on Windows). On POSIX it re-chmods the dir (0o700) and temp file
  (0o600) since mkdir/writeFile modes are ignored on an existing dir; win32
  relies on the user-profile ACL like the main account store.

codex-manager.ts — thread org id instead of mutating process.env
  runAuthLogin mutated the global CODEX_AUTH_ACCOUNT_ID for the duration of a
  login (racy on concurrent re-entry). The org is now passed explicitly to
  resolveAccountSelection (explicit arg wins, env still honored as fallback so
  the runtime-proxy mechanism is unchanged), and the env mutation is gone.

Regression tests: ZWJ/flag/skin-tone/non-Latin-combining width + cluster-safe
truncation; withBodyTimeout stream-cancel on timeout. typecheck + lint clean;
full suite 4198 passed / 1 skipped.

* fix: resolve round-7 CodeRabbit findings (round 8)

Regressions I introduced earlier + pre-existing bugs surfaced by the re-scan,
all fixed with regression tests.

Regressions from my prior rounds:
- debug-bundle.ts redactHome: on win32 it compared `/` and `\` literally, so a
  forward-slash path (c:/users/alice/...) bypassed the home-prefix redaction and
  leaked the username. Now folds case AND separator before comparing.
- ui/display-width.ts: the ZWJ fast-path keyed off `width === 2`, so it wrongly
  collapsed a ZWJ between non-emoji wide chars (漢‍字 → 2 instead of 4). Now
  gated on emoji/pictographic code points only.
- oc-chatgpt-orchestrator.ts: the temp-file cleanup unlink was still single-shot
  after the (already-retried) rename; route it through withFileOperationRetry so
  a transient Windows lock can't leave a secret-bearing .tmp behind.

Pre-existing bugs:
- local-bridge.ts: an inbound `x-api-key` was forwarded upstream alongside the
  stripped Authorization, leaking the local client credential across the bridge.
  Strip it too.
- accounts.ts: account removal cleared tracker state under the recomputed
  identity key, not the stable runtime tracker key state is written under, so an
  account enriched after first-track left stale health/token entries a re-add
  inherited. Clear under getRuntimeTrackerKey (and the identity key when it
  differs).
- config.ts: getPluginConfigExplainReport read via a single readFileSync, so a
  transient Windows lock reported storageKind:"unreadable" even when
  loadPluginConfig succeeded on retry. Route through readFileSyncWithConfigRetry.
- runtime-rotation-proxy.ts: IPv6 loopback was inconsistent — server.listen needs
  the raw `::1` while the baseUrl needs bracketed `[::1]`. Normalize once at
  startup (bindHost raw, urlHost bracketed).
- table-formatter.ts: a zero-width column still emitted `…`, overflowing the
  declared width by one and desyncing the row from the header. Short-circuit
  width<=0 to "".
- experimental-settings-prompt.ts: the refresh-interval label rounded to whole
  minutes after the menu moved to a 5s step, so 25_000ms rendered "0 min". Use
  formatWaitTime for sub-minute granularity.
- package.json: removed the `preuninstall` lifecycle hook — npm@7+ never fires
  it (the codebase already documents this in commands/uninstall.ts), so it was
  dead config. The script stays shipped for the explicit `uninstall` command;
  flipped the package-bin test to assert the hook is NOT wired.

codex-manager.ts org-override: the explicit `login --org` now also treats a
blank/whitespace value as absent so it falls back to CODEX_AUTH_ACCOUNT_ID
(surfaced by the new regression).

Coverage added (CodeRabbit-requested): org-override contract (new
test/codex-manager-org-override.test.ts), glyph-mode quota bars (new
test/auth-menu-quota-bar.test.ts), raw-token leak assertions + a realistic
sanitizeValue mock in the debug-bundle cli test, mixed-separator redactHome,
non-emoji ZWJ width, zero-width table column, x-api-key strip, tracker-key
cleanup after enrichment, config-explain transient-lock retry, IPv6 proxy
startup, sub-minute interval label, usage-ledger-rotation budget, windows
drive-letter symlink case, storage-parser EPERM retry, and an unconditional
oauth-server teardown port wait. Doc comments on paths.ts loop bound + canonical
walk cost.

typecheck + lint clean; full suite 4217 passed / 1 skipped.

* fix: resolve round-8 CodeRabbit findings (round 9)

CRITICAL — config.ts: drop the Atomics.wait blocking sleep on the sync path.
  readFileSyncWithConfigRetry slept via Atomics.wait, and loadPluginConfig() is
  synchronous + runs on startup, so a transient Windows lock froze the entire
  event loop for up to ~150ms. Replaced with immediate (non-blocking) retries; a
  brief exclusive lock clears within a tick and persistent failure falls through
  to the existing unreadable/default handling. The async config path already used
  await sleep and is unchanged.

MAJOR (security) — recovery/storage.ts: reject path-unsafe message/part ids.
  The structural validators only checked `id` was a string, so `{ "id":
  "../poison" }` survived readMessages/readParts and would feed a traversal into
  readParts(msg.id). Both validators now also require SAFE_ID_PATTERN, so a
  traversal id is quarantined like corruption.

MAJOR — local-bridge.ts: normalize IPv6 loopback for bind vs baseUrl.
  isLoopbackHost accepted "[::1]" but startLocalBridge used the raw host for both
  server.listen (needs "::1") and baseUrl (needs "[::1]"), so "[::1]" failed the
  bind and "::1" produced "http://::1:port". Normalize once (bindHost raw, urlHost
  bracketed); request URL + returned host/baseUrl all use the right form. Mirrors
  the runtime-rotation-proxy fix.

MAJOR — logger.ts: invalidate the logDirReady cache on ENOENT writes.
  ensureLogDir caches "ready" after first success; if LOG_DIR is later deleted,
  writeFileSync fails ENOENT forever. Reset the cache on a directory-missing write
  failure so the next logRequest recreates the dir.

MAJOR — codex-manager.ts: stop exporting the CLI-internal resolveAccountSelection.
  Extracted the org-override precedence into lib/auth/org-override.ts
  (resolveOrgOverride) so the concurrency contract is unit-testable without
  widening the CLI module's public surface; resolveAccountSelection is internal
  again and delegates to it.

MINOR — ui/select.ts: restore the cursor even if setRawMode throws.
  stdout.write(ANSI.show) shared a try with the fragile setRawMode, so a throw
  there left the terminal cursor hidden. Each teardown step now has its own try.

Tests: unsafe-id quarantine (messages + parts), config-explain transient-lock
already covered, logger ENOENT cache-reset, IPv6 bridge bind/baseUrl (::1 and
[::1]), resolveOrgOverride precedence (replacing the internal-import test),
flagged-storage + export rename retries for ENOTEMPTY/EACCES. Dropped explicit
vitest imports in the org-override test.

typecheck + lint clean; full suite 4228 passed / 1 skipped.

* fix: resolve round-9 CodeRabbit findings (round 10)

Code:
- config.ts: resolvePluginConfigPath() treated a set-but-non-existent
  CODEX_MULTI_AUTH_CONFIG_PATH as the path unconditionally, so loadPluginConfig()
  threw ENOENT in the fallback and collapsed to defaults — masking a real legacy
  config on disk. Treat a missing env override as absent. (savePluginConfig reads
  the env var directly, so save-to-new-env-path is unaffected.)
- prompts/codex.ts: a legacy cache entry with NO sha256 was trusted (served
  as-is, and a 304 minted a digest over un-vetted bytes). Treat missing-sha as
  unverified: don't fast-path serve, don't send/accept conditional revalidation
  (require a prior sha to trust a 304); force a full 200 to mint the first digest,
  keeping old bytes only as an offline fallback.
- recovery/storage.ts: validators now reject a path-unsafe string id
  (SAFE_ID_PATTERN) and a non-numeric time.created — both previously survived and
  caused a path-traversal read / a NaN sort that mis-ordered recovery.
- runtime-rotation-proxy.ts: removed the allowNonLoopbackHost escape hatch. The
  proxy forwards managed OAuth tokens and is now loopback-only with no opt-out.
- storage/import-export.ts: the staged-export temp cleanup was single-shot; route
  it through the shared retry so a transient Windows lock can't strand a
  secret-bearing .tmp next to the destination.
- ui/display-width.ts: emoji-presentation clusters (U+FE0F) and keycaps (U+20E3)
  now count as width 2 (☀️ ❤️ 1️⃣), so table/menu alignment is correct.

Tests: config legacy-fallback; no-sha cache forces full GET / offline fallback /
no If-None-Match (and repaired a stale-while-revalidate test that relied on the
old no-sha trust); unsafe-id + non-numeric-time quarantine; non-loopback proxy
bind refused unconditionally; export temp-cleanup retry; emoji-presentation +
keycap widths. Test-contract fixes: vitest-globals + tty-descriptor restore in
auth-menu-quota-bar; x-api-key strip on the auth-enabled bridge path; real
temp+rename atomicity assertion; numeric/string key-normalization coverage;
storage-parser string (not Buffer) reads; Accept-header assertion; schema-derived
guardian step; IPv6 teardown port probe.

typecheck + lint clean; full suite 4236 passed / 1 skipped.

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
N
Neil Daquioag committed
d930ac8eb6a53e079eb4e09e1c075d53894bffd4
Parent: 70ab38a
Committed by GitHub <noreply@github.com> on 6/2/2026, 4:49:00 AM