perf(vm): add ContainerLen opcode for .length() intrinsic (#3600)
## Summary - Adds a dedicated `ContainerLen` opcode that replaces native function calls to `.length()` on Array, Map, String, and Uint8Array - The MIR lowering now intercepts these calls and emits `Rvalue::Len` (which already existed in the IR but was never generated), and the bytecode emitter maps it to the new single-byte opcode instead of a `Call` instruction - Also switches the native function call arg buffer from `Vec<Value>` to `SmallVec<[Value; 4]>`, avoiding heap allocation for native calls with ≤4 args (the common case) - Fixes a crash in the speedtest tool when `--baml` is not passed (CLI path wasn't resolved to absolute) ### Why this matters Profiling showed that `.length()` on arrays goes through the full native call path: resolve callable → drain args into heap-allocated Vec → call Rust fn (which just reads `vec.len()`) → free Vec → push result. For bubble sort's inner loop, that's ~25M native function calls for what should be a single field read. ### Speedtest results (`speedtest compare canary vbv_container-len-opcode`) | Workload | Change | |----------|--------| | bubble sort 5k | **-20%** | | substring slice 100k | **-21%** | | array build+sum 100k | **-20%** | | grid alloc 1000x100 | **-20%** | | string split 100k | **-11%** | | split short literal 100k | **-10%** | | contains medium literal 100k | **-10%** | | trim padded literal 100k | **-10%** | No regressions on any benchmark. ## Test plan - [x] `cargo clippy` passes - [x] `baml-cli run --file bubble_sort.baml main` produces correct output (5001) - [x] `speedtest run --build` passes all 30 workloads - [x] `speedtest compare` shows improvements on `.length()`-heavy workloads, no regressions 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Performance Improvements** * Container/string length is now a dedicated VM operation and the VM avoids heap allocation for common native-call cases, improving runtime efficiency. * **Debug / Tooling** * Disassembly and debug output recognize and display the new length operation. * Speedtest runner resolves CLI paths to absolute locations and reports branch from saved baseline for more reliable results. * **Chores** * Added workspace dependency to support the changes; speedtest persistence API now accepts an explicit CLI override. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/BoundaryML/baml/pull/3600?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
H
hellovai committed
a756eb5ca0eb39e4f00a77fdf0b229f99cada107
Parent: ddd0ff1
Committed by GitHub <noreply@github.com>
on 5/29/2026, 2:54:00 PM