SIGN IN SIGN UP

fix(039): resolve 13 QA findings in security scanner plugin system (#372)

* fix(039): resolve 13 QA findings in security scanner plugin system

Bundled fix for bugs surfaced by a full QA pass of Spec 039. Each finding has
a one-line code reference and a regression test where applicable.

Critical:
* F-01 security approve now actually unquarantines the server. The scanner
  Service gets a new ServerUnquarantiner dependency injected from
  server.Server.UnquarantineServer; ApproveServer calls it after persisting
  the integrity baseline. Tool indexing and quarantine-bucket removal follow
  the existing unquarantine path. Critical-findings guard still blocks the
  call before anything mutates. TestServiceApproveServerCallsUnquarantiner
  and TestServiceApproveServerCriticalDoesNotUnquarantine pin the contract.
* F-02 Source resolver no longer misreads a server's positional data-dir
  argument as source code. For package-runner commands (npx, uvx, pipx,
  pnpm dlx, bunx, yarn dlx) resolveFromPackageCache runs first; arg-scan
  fallback only accepts directories containing a source marker
  (package.json, pyproject.toml, setup.py, Cargo.toml, go.mod, or a source
  file). Regression tests cover the filesystem-server false-positive case
  and the legitimate python-script arg case.

High:
* F-03 macOS keychain probe no longer pops the "Keychain Not Found" modal.
  KeyringProvider.IsAvailable uses a read-only Get probe with a 2s timeout
  and a headless-environment fast path instead of keyring.Set. Public Get,
  Store, Delete, List, and both registry helpers are wrapped in a new
  runWithTimeout helper (3s hard deadline) backed by ErrKeyringTimeout.
  Scanner ConfigureScanner already falls back to in-config storage on Store
  errors. TestKeyringProvider_IsAvailable_HangingBackend proves the bail-out.
* F-04 UI Approve buttons are now scanner-aware. ServerCard, ServerDetail
  and ScanReport call the new securityApproveServer store action. A custom
  modal gates force-approve on servers with no scan or with critical
  findings; the legacy unquarantineServer path is retained only for admin
  and the Scan First action.
* F-05 CLI security scan no longer hangs after the job reports completed.
  The poll loop now terminates on completed|failed|cancelled and prints live
  progress lines (N run, M running, F failed).

Medium:
* F-06 --dry-run prints a structured plan (source method, source path,
  scanner images, commands, timeouts) and exits without invoking the scan
  endpoint. No containers start.
* F-09 Scanner status vocabulary unified between table and JSON output
  (available | pulling | installed | configured | error).
* F-10 security report text output now contains "Scanners: X run, Y failed
  (names) of Z" and a coverage warning when any scanner did not run, so a
  user cannot mistake scanner crashes for "clean".
* F-11 Scan history endpoint returns the aggregated risk_score instead of
  always zero.

UX:
* F-12 Dashboard "Security Scan" chip is now a live router-link to /security
  with no "soon" placeholder; matches adjacent Docker / Quarantine chips.
* F-14 security overview shows "Last scan: never" instead of
  0001-01-01 00:00:00, and JSON emits last_scan_at: null.
* F-15 security configure returns in ~3s (was 60s+), bumped client timeout
  from 10s to 60s as additional safety, added existence prefetch so typos
  return 404 fast.
* F-16 scan --all redraws the progress table in place on TTY via ANSI
  cursor escapes; falls back to per-line output when piped; FINDINGS column
  now populated from findings_count.

Extra:
* mcpproxy security subcommands now honor --config and --data-dir global
  flags (newSecurityCLIClient checks the package-level configFile/dataDir
  before calling config.LoadFromFile, mirroring main.go).

Not fixed (documented):
* F-07 ramparts scanner container ships a GLIBC_2.39-linked binary that
  fails on arm64 macOS. Upstream scanner image issue, not mcpproxy code.
* F-13 cisco-mcp-scanner stdout contains a hardcoded server_url header.
  Cosmetic upstream scanner output quirk.

Post-fix revalidation: 6 of 7 scanners run end-to-end on arm64 macOS
against the three test servers (everything, fetch-test, filesystem-test).
The QA report in docs/qa/security-scanners-2026-04-10.html now contains a
§11 "After-Fix Revalidation" section with per-finding verification
evidence, updated code-landscape table, and reproduction instructions.

Tests: internal/secret, internal/security, internal/security/scanner,
cmd/mcpproxy, frontend unit tests all pass. golangci-lint: 0 issues.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(039): never call keyring.Set from scanner configure on macOS (F-03 follow-up)

The previous fix in this branch removed the IsAvailable probe that was
popping the "Keychain Not Found" system modal, but the modal continued to
appear on the user's machine during every `mcpproxy security configure`
call with the real scanner_<id>_<env> key (screenshotted).

Root cause: my runWithTimeout helper wrapped keyring.Set in a goroutine
and returned ErrKeyringTimeout after 3s if the goroutine didn't finish.
That protects the server process from blocking but does NOT cancel the
goroutine — Go has no goroutine cancellation, and the underlying
Security.framework call continues running in the background. While it
runs, macOS keeps the modal on the user's screen. Worse: each subsequent
Store call spawns another zombie goroutine.

Fix: eliminate the keyring.Set call from the scanner configure path
entirely.

1. internal/security/scanner/service.go — ConfigureScanner now stores
   scanner env values directly in the scanner's ConfiguredEnv map in
   BBolt. No SecretStore call. Scanner env vars end up in the scanner
   container's /proc/environ at scan time anyway, so keyring storage adds
   no meaningful confidentiality. Users who want keyring-backed storage
   for a specific secret can still pass a `${keyring:my-name}` reference
   as the env value — the resolver expands it via a read-only Get at
   scan time, which is safe.

2. internal/secret/keyring_provider.go — KeyringProvider.Store now
   refuses to call keyring.Set on darwin by default. The opt-in is
   MCPPROXY_KEYRING_WRITE=1 (env) or SetWritesEnabled(true) (programmatic,
   for tests). On Linux/Windows the default is unchanged. The previous
   three-layer guard (known-unavailable cache, first-time probe, failure
   cache) still applies behind the macOS gate for the opt-in case.

3. internal/secret/keyring_provider_test.go — new regression test
   TestKeyringProvider_Store_MacOSDefaultRefuses pins the exact user
   scenario (configure mcp-scan SNYK_TOKEN on macOS) and asserts that
   keyring.Set is NOT called. Other Store tests updated to explicitly
   opt in via SetWritesEnabled(true) so they exercise the three-layer
   guard path regardless of runtime.GOOS.

Verified on arm64 macOS: `time mcpproxy security configure mcp-scan --env
SNYK_TOKEN=...` returns in 0.041s (was 3s, before that 60s+), emits zero
keyring log lines, and does NOT pop the macOS modal. Subsequent scan of
the "everything" server runs mcp-scan successfully using the token
pulled from ConfiguredEnv.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(039): document the security scanner system and add an extensive CLI reference

The existing docs/features/security-scanner-plugins.md was out of date
relative to the actual v0.23 scanner system:

* Listed only 4 of the 7 bundled scanners (missing mcp-ai-scanner,
  nova-proximity, ramparts, and mislabeled mcp-scan as "Invariant Labs"
  when it is now "Snyk Agent Scan").
* Documented `install` / `remove` CLI subcommands as canonical. Those
  are now hidden aliases for `enable` / `disable`.
* Claimed `auto_scan_quarantined` defaults to `true` and
  `integrity_check_on_restart` to `true`. Both default to `false` in
  code.
* Referred to the scanner configure path writing secrets to the OS
  keyring. The path no longer touches the OS keyring by default on
  macOS (see PR #372 F-03 follow-up); env values are stored directly
  in the scanner record in BBolt.
* Had no mention of `--all`, `--async`, `--dry-run`, `--scanners`,
  `rescan`, `cancel-all`, `status`, the source-resolution order, the
  aggregated-report "Scanners: X run, Y failed" line, the `available
  / pulling / installed / configured / error` vocabulary, SSE events,
  or the scanner plugin contract.

This commit:

1. Rewrites docs/features/security-scanner-plugins.md to reflect the
   actual v0.23 scanner system, with all 7 bundled scanners, accurate
   config defaults, the full REST API surface, SSE events, and a
   pointer to the new CLI reference.

2. Adds docs/cli/security-commands.md (~760 lines) — an extensive
   reference covering every `mcpproxy security` subcommand, flags,
   status vocabulary, output formats, live-progress behavior, source
   resolution, typical workflows (interactive, CI/scripting, false-
   positive triage), and common failure modes. Mirrors the style of
   the existing activity-commands.md and management-commands.md.

3. Updates docs/cli/command-reference.md with a short Security Scanner
   Commands section pointing at the new extensive reference. Matches
   the pattern used for Tool Quarantine / Code Execution.

No code changes. Documentation only. Verified every subcommand in the
new reference exists in the actual CLI by diffing against
`mcpproxy security --help` output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(039): docs build, UI calm-mode, npx volume source extraction

- docs: drop links to untracked qa/security-scanners-2026-04-10.html that
  were breaking the Docusaurus build on PR #372 and Cloudflare Pages
- ui: tone down Security tab info banners — Docker Isolated, Local Process
  and HTTP Server are now neutral cards; only "No Source Available" keeps
  the alert-error color. Local Process is no longer framed as a warning
  because running without Docker isolation is a user choice, not a fault
- ui: clarify the Security Scanners description to state that each scanner
  runs inside an isolated Docker container sandboxed from the host
- ui: hide Dashboard Docker-isolation and Quarantine chips until their
  status has been fetched, to avoid flashing a false "disabled" warning
  on initial page load
- scanner: extend the container source resolver with a two-strategy
  extraction for npx/uvx servers — the target package is now located
  directly via `docker exec` and copied out because volume-mounted caches
  (e.g. /root/.npm) never show up in `docker diff`, while `docker diff` is
  still used to pick up additional user-added source in /app or /src

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Code <noreply@anthropic.com>
D
Dumbris committed
3f975ab973631357a5bf34e810a36a7d5b02758f
Parent: 6fbf914
Committed by GitHub <noreply@github.com> on 4/10/2026, 1:04:05 PM