SIGN IN SIGN UP

perf: remove sparse_ngram_index entirely

The sparse n-gram index was a redundant search-acceleration layer
that duplicated ~70% of trigram_index's recall surface on niche
fuzzy queries. Tier 3 (skip_trigram_files scan) + tier 5 (full
outline scan when !trigram_ruled_out) cover the same ground.

Removed:
  • field on Explorer struct
  • init/deinit in init/deinit/releaseSecondaryIndexes
  • indexFile/removeFile call sites in commit + rebuildTrigrams + removeFile
  • tier 2 candidate scan in searchContent
  • approxIndexSizeBytes contribution in telemetry
  • adversarial test for index population (tests removed behavior)
  • test_index.zig regression test for sparse/trigram intersection

End-to-end measurement on codedb's own repo (284 files, 4 MB snapshot):

  codedb_status   4.1µs → 2.5µs   −39%  (previously at "floor")
  codedb_edit     3.5µs → 2.3µs   −34%
  codedb_tree    16.1µs → 10.5µs  −35%

Memory stays ~13 MB on this small corpus — savings show up on
high-lexical-diversity workloads (large monorepos) where the
sparse n-gram hashmap would otherwise grow large.

633/633 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
J
justrach committed
e9fca7cd0612e214e4ea1366635aefe2d96e39c0
Parent: 2bb8508