SIGN IN SIGN UP

perf(vm): add ContainerLen opcode for .length() intrinsic (#3600)

## Summary

- Adds a dedicated `ContainerLen` opcode that replaces native function
calls to `.length()` on Array, Map, String, and Uint8Array
- The MIR lowering now intercepts these calls and emits `Rvalue::Len`
(which already existed in the IR but was never generated), and the
bytecode emitter maps it to the new single-byte opcode instead of a
`Call` instruction
- Also switches the native function call arg buffer from `Vec<Value>` to
`SmallVec<[Value; 4]>`, avoiding heap allocation for native calls with
≤4 args (the common case)
- Fixes a crash in the speedtest tool when `--baml` is not passed (CLI
path wasn't resolved to absolute)

### Why this matters

Profiling showed that `.length()` on arrays goes through the full native
call path: resolve callable → drain args into heap-allocated Vec → call
Rust fn (which just reads `vec.len()`) → free Vec → push result. For
bubble sort's inner loop, that's ~25M native function calls for what
should be a single field read.

### Speedtest results (`speedtest compare canary
vbv_container-len-opcode`)

| Workload | Change |
|----------|--------|
| bubble sort 5k | **-20%** |
| substring slice 100k | **-21%** |
| array build+sum 100k | **-20%** |
| grid alloc 1000x100 | **-20%** |
| string split 100k | **-11%** |
| split short literal 100k | **-10%** |
| contains medium literal 100k | **-10%** |
| trim padded literal 100k | **-10%** |

No regressions on any benchmark.

## Test plan

- [x] `cargo clippy` passes
- [x] `baml-cli run --file bubble_sort.baml main` produces correct
output (5001)
- [x] `speedtest run --build` passes all 30 workloads
- [x] `speedtest compare` shows improvements on `.length()`-heavy
workloads, no regressions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Performance Improvements**
* Container/string length is now a dedicated VM operation and the VM
avoids heap allocation for common native-call cases, improving runtime
efficiency.

* **Debug / Tooling**
* Disassembly and debug output recognize and display the new length
operation.
* Speedtest runner resolves CLI paths to absolute locations and reports
branch from saved baseline for more reliable results.

* **Chores**
* Added workspace dependency to support the changes; speedtest
persistence API now accepts an explicit CLI override.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/BoundaryML/baml/pull/3600?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
H
hellovai committed
a756eb5ca0eb39e4f00a77fdf0b229f99cada107
Parent: ddd0ff1
Committed by GitHub <noreply@github.com> on 5/29/2026, 2:54:00 PM