use sorted inserts to improve builder performance (#91553)
## What Optimizes AMQF (Approximate Membership Query Filter) construction in `StreamingSstWriter` by deferring filter building to `close()` time and using qfilter's sorted `Builder` API. ## Why The previous approach inserted each key hash into the AMQF filter eagerly during `add()`, using random-access `insert_fingerprint` calls. The new qfilter `Builder` API supports sequential sorted insertion which is significantly faster, but requires fingerprints in non-decreasing order. ## How 1. **Deferred construction**: Instead of building the AMQF incrementally during `add()`, collect key hashes (truncated to `u32`) into a vec during writes, then build the filter in one pass at `close()` time. 2. **Sorted Builder insertion**: Sort collected hashes by fingerprint value, then feed them to `Builder::insert_fingerprint` in order. This uses the Builder's optimized sorted-insert path. 3. **u32 storage**: Since fingerprint size is always ≤32 bits, store collected hashes as `u32` instead of `u64`, halving memory usage and improving sort cache behavior. 4. **Exact sizing**: The Builder is constructed with the exact entry count (known at `close()` time) rather than the `max_entry_count` estimate, producing optimally-sized filters. ## Benchmark results (vs `filter_ref` baseline) `write/key_8/value_4/` benchmark: | Entries | filter_ref | sorted_insert | Change | |---------|-----------|---------------|--------| | 85K | 22.3 ms | 21.1 ms | **-5.3%** | | 853K | 149.2 ms | 111.9 ms | **-25.0%** | | 8.3M | 1049 ms | 998.8 ms | **-4.8%** | The 853K case (typical compacted SST size) shows the largest improvement as AMQF construction is a significant fraction of total write time at that scale.
L
Luke Sandberg committed
b43c4efb155e7cd8a4e3a2d914049a6d6dafaa5a
Parent: d3cbd5b
Committed by GitHub <noreply@github.com>
on 4/6/2026, 8:27:04 PM