fix(039): resolve 13 QA findings in security scanner plugin system (#372)
* fix(039): resolve 13 QA findings in security scanner plugin system
Bundled fix for bugs surfaced by a full QA pass of Spec 039. Each finding has
a one-line code reference and a regression test where applicable.
Critical:
* F-01 security approve now actually unquarantines the server. The scanner
Service gets a new ServerUnquarantiner dependency injected from
server.Server.UnquarantineServer; ApproveServer calls it after persisting
the integrity baseline. Tool indexing and quarantine-bucket removal follow
the existing unquarantine path. Critical-findings guard still blocks the
call before anything mutates. TestServiceApproveServerCallsUnquarantiner
and TestServiceApproveServerCriticalDoesNotUnquarantine pin the contract.
* F-02 Source resolver no longer misreads a server's positional data-dir
argument as source code. For package-runner commands (npx, uvx, pipx,
pnpm dlx, bunx, yarn dlx) resolveFromPackageCache runs first; arg-scan
fallback only accepts directories containing a source marker
(package.json, pyproject.toml, setup.py, Cargo.toml, go.mod, or a source
file). Regression tests cover the filesystem-server false-positive case
and the legitimate python-script arg case.
High:
* F-03 macOS keychain probe no longer pops the "Keychain Not Found" modal.
KeyringProvider.IsAvailable uses a read-only Get probe with a 2s timeout
and a headless-environment fast path instead of keyring.Set. Public Get,
Store, Delete, List, and both registry helpers are wrapped in a new
runWithTimeout helper (3s hard deadline) backed by ErrKeyringTimeout.
Scanner ConfigureScanner already falls back to in-config storage on Store
errors. TestKeyringProvider_IsAvailable_HangingBackend proves the bail-out.
* F-04 UI Approve buttons are now scanner-aware. ServerCard, ServerDetail
and ScanReport call the new securityApproveServer store action. A custom
modal gates force-approve on servers with no scan or with critical
findings; the legacy unquarantineServer path is retained only for admin
and the Scan First action.
* F-05 CLI security scan no longer hangs after the job reports completed.
The poll loop now terminates on completed|failed|cancelled and prints live
progress lines (N run, M running, F failed).
Medium:
* F-06 --dry-run prints a structured plan (source method, source path,
scanner images, commands, timeouts) and exits without invoking the scan
endpoint. No containers start.
* F-09 Scanner status vocabulary unified between table and JSON output
(available | pulling | installed | configured | error).
* F-10 security report text output now contains "Scanners: X run, Y failed
(names) of Z" and a coverage warning when any scanner did not run, so a
user cannot mistake scanner crashes for "clean".
* F-11 Scan history endpoint returns the aggregated risk_score instead of
always zero.
UX:
* F-12 Dashboard "Security Scan" chip is now a live router-link to /security
with no "soon" placeholder; matches adjacent Docker / Quarantine chips.
* F-14 security overview shows "Last scan: never" instead of
0001-01-01 00:00:00, and JSON emits last_scan_at: null.
* F-15 security configure returns in ~3s (was 60s+), bumped client timeout
from 10s to 60s as additional safety, added existence prefetch so typos
return 404 fast.
* F-16 scan --all redraws the progress table in place on TTY via ANSI
cursor escapes; falls back to per-line output when piped; FINDINGS column
now populated from findings_count.
Extra:
* mcpproxy security subcommands now honor --config and --data-dir global
flags (newSecurityCLIClient checks the package-level configFile/dataDir
before calling config.LoadFromFile, mirroring main.go).
Not fixed (documented):
* F-07 ramparts scanner container ships a GLIBC_2.39-linked binary that
fails on arm64 macOS. Upstream scanner image issue, not mcpproxy code.
* F-13 cisco-mcp-scanner stdout contains a hardcoded server_url header.
Cosmetic upstream scanner output quirk.
Post-fix revalidation: 6 of 7 scanners run end-to-end on arm64 macOS
against the three test servers (everything, fetch-test, filesystem-test).
The QA report in docs/qa/security-scanners-2026-04-10.html now contains a
§11 "After-Fix Revalidation" section with per-finding verification
evidence, updated code-landscape table, and reproduction instructions.
Tests: internal/secret, internal/security, internal/security/scanner,
cmd/mcpproxy, frontend unit tests all pass. golangci-lint: 0 issues.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(039): never call keyring.Set from scanner configure on macOS (F-03 follow-up)
The previous fix in this branch removed the IsAvailable probe that was
popping the "Keychain Not Found" system modal, but the modal continued to
appear on the user's machine during every `mcpproxy security configure`
call with the real scanner_<id>_<env> key (screenshotted).
Root cause: my runWithTimeout helper wrapped keyring.Set in a goroutine
and returned ErrKeyringTimeout after 3s if the goroutine didn't finish.
That protects the server process from blocking but does NOT cancel the
goroutine — Go has no goroutine cancellation, and the underlying
Security.framework call continues running in the background. While it
runs, macOS keeps the modal on the user's screen. Worse: each subsequent
Store call spawns another zombie goroutine.
Fix: eliminate the keyring.Set call from the scanner configure path
entirely.
1. internal/security/scanner/service.go — ConfigureScanner now stores
scanner env values directly in the scanner's ConfiguredEnv map in
BBolt. No SecretStore call. Scanner env vars end up in the scanner
container's /proc/environ at scan time anyway, so keyring storage adds
no meaningful confidentiality. Users who want keyring-backed storage
for a specific secret can still pass a `${keyring:my-name}` reference
as the env value — the resolver expands it via a read-only Get at
scan time, which is safe.
2. internal/secret/keyring_provider.go — KeyringProvider.Store now
refuses to call keyring.Set on darwin by default. The opt-in is
MCPPROXY_KEYRING_WRITE=1 (env) or SetWritesEnabled(true) (programmatic,
for tests). On Linux/Windows the default is unchanged. The previous
three-layer guard (known-unavailable cache, first-time probe, failure
cache) still applies behind the macOS gate for the opt-in case.
3. internal/secret/keyring_provider_test.go — new regression test
TestKeyringProvider_Store_MacOSDefaultRefuses pins the exact user
scenario (configure mcp-scan SNYK_TOKEN on macOS) and asserts that
keyring.Set is NOT called. Other Store tests updated to explicitly
opt in via SetWritesEnabled(true) so they exercise the three-layer
guard path regardless of runtime.GOOS.
Verified on arm64 macOS: `time mcpproxy security configure mcp-scan --env
SNYK_TOKEN=...` returns in 0.041s (was 3s, before that 60s+), emits zero
keyring log lines, and does NOT pop the macOS modal. Subsequent scan of
the "everything" server runs mcp-scan successfully using the token
pulled from ConfiguredEnv.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs(039): document the security scanner system and add an extensive CLI reference
The existing docs/features/security-scanner-plugins.md was out of date
relative to the actual v0.23 scanner system:
* Listed only 4 of the 7 bundled scanners (missing mcp-ai-scanner,
nova-proximity, ramparts, and mislabeled mcp-scan as "Invariant Labs"
when it is now "Snyk Agent Scan").
* Documented `install` / `remove` CLI subcommands as canonical. Those
are now hidden aliases for `enable` / `disable`.
* Claimed `auto_scan_quarantined` defaults to `true` and
`integrity_check_on_restart` to `true`. Both default to `false` in
code.
* Referred to the scanner configure path writing secrets to the OS
keyring. The path no longer touches the OS keyring by default on
macOS (see PR #372 F-03 follow-up); env values are stored directly
in the scanner record in BBolt.
* Had no mention of `--all`, `--async`, `--dry-run`, `--scanners`,
`rescan`, `cancel-all`, `status`, the source-resolution order, the
aggregated-report "Scanners: X run, Y failed" line, the `available
/ pulling / installed / configured / error` vocabulary, SSE events,
or the scanner plugin contract.
This commit:
1. Rewrites docs/features/security-scanner-plugins.md to reflect the
actual v0.23 scanner system, with all 7 bundled scanners, accurate
config defaults, the full REST API surface, SSE events, and a
pointer to the new CLI reference.
2. Adds docs/cli/security-commands.md (~760 lines) — an extensive
reference covering every `mcpproxy security` subcommand, flags,
status vocabulary, output formats, live-progress behavior, source
resolution, typical workflows (interactive, CI/scripting, false-
positive triage), and common failure modes. Mirrors the style of
the existing activity-commands.md and management-commands.md.
3. Updates docs/cli/command-reference.md with a short Security Scanner
Commands section pointing at the new extensive reference. Matches
the pattern used for Tool Quarantine / Code Execution.
No code changes. Documentation only. Verified every subcommand in the
new reference exists in the actual CLI by diffing against
`mcpproxy security --help` output.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(039): docs build, UI calm-mode, npx volume source extraction
- docs: drop links to untracked qa/security-scanners-2026-04-10.html that
were breaking the Docusaurus build on PR #372 and Cloudflare Pages
- ui: tone down Security tab info banners — Docker Isolated, Local Process
and HTTP Server are now neutral cards; only "No Source Available" keeps
the alert-error color. Local Process is no longer framed as a warning
because running without Docker isolation is a user choice, not a fault
- ui: clarify the Security Scanners description to state that each scanner
runs inside an isolated Docker container sandboxed from the host
- ui: hide Dashboard Docker-isolation and Quarantine chips until their
status has been fetched, to avoid flashing a false "disabled" warning
on initial page load
- scanner: extend the container source resolver with a two-strategy
extraction for npx/uvx servers — the target package is now located
directly via `docker exec` and copied out because volume-mounted caches
(e.g. /root/.npm) never show up in `docker diff`, while `docker diff` is
still used to pick up additional user-added source in /app or /src
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Code <noreply@anthropic.com> D
Dumbris committed
3f975ab973631357a5bf34e810a36a7d5b02758f
Parent: 6fbf914
Committed by GitHub <noreply@github.com>
on 4/10/2026, 1:04:05 PM