Embed tree-sitter wasm as ~268 chunked base64 string literals
Three previous approaches all failed on Windows in subtly different ways:
1. Single 274KB base64 string literal: bun's Windows minifier dropped
or transformed it (build verified the prefix wasn't in the binary
even though the embed step wrote the file).
2. `with { type: 'file' }` from a node_modules subpath: bytes ended up
in the binary but the import variable was bound to undefined at
runtime — bun on Windows mishandles the JS-level binding for that
attribute.
3. `with { type: 'file' }` from a relative path (wasm copied into
pre-init/): same as #2 — confirms it's not subpath-vs-relative,
it's a bun/Windows bug with the import-attribute binding.
Round 4: write the base64 as ~268 small chunks (1024 chars each) in an
exported array, joined and decoded at runtime in the pre-init. Each
chunk is referenced unconditionally at runtime via .join(''), so DCE
can't eliminate it; each is small enough that no minifier heuristic
would treat it as a special "huge string literal" worth dropping.
- cli/scripts/build-binary.ts: embedTreeSitterWasmAsChunks() writes the
full array, returns sample chunks (start/middle/end) for the post-
build verification scan to look for in the compiled binary. Restores
the empty stub eagerly + via process.on('exit').
- cli/src/pre-init/tree-sitter-wasm-bytes.ts: re-introduced as a stub
exporting an empty readonly string[]. Dev-mode and unit tests see
the empty stub; production builds get the real chunks written in by
build-binary.ts.
- cli/src/pre-init/tree-sitter-wasm.ts: import the chunks, .join(''),
Buffer.from(_, 'base64'), publish on globalThis. The if() guard
remains because dev mode legitimately has zero chunks.
Verified locally: build embeds 268 chunks, post-build verifies 3 sample
chunks at distinct offsets in the compiled binary, --smoke-tree-sitter
exits 0 with "tree-sitter smoke ok (wasmBinary, 205488 bytes)", full
smoke passes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> J
James Grugett committed
3ad502b0e1677f4dc12afae8a4f99c3ddbaeedcd
Parent: e505cc7