SIGN IN SIGN UP

Embed tree-sitter wasm as ~268 chunked base64 string literals

Three previous approaches all failed on Windows in subtly different ways:

 1. Single 274KB base64 string literal: bun's Windows minifier dropped
    or transformed it (build verified the prefix wasn't in the binary
    even though the embed step wrote the file).
 2. `with { type: 'file' }` from a node_modules subpath: bytes ended up
    in the binary but the import variable was bound to undefined at
    runtime — bun on Windows mishandles the JS-level binding for that
    attribute.
 3. `with { type: 'file' }` from a relative path (wasm copied into
    pre-init/): same as #2 — confirms it's not subpath-vs-relative,
    it's a bun/Windows bug with the import-attribute binding.

Round 4: write the base64 as ~268 small chunks (1024 chars each) in an
exported array, joined and decoded at runtime in the pre-init. Each
chunk is referenced unconditionally at runtime via .join(''), so DCE
can't eliminate it; each is small enough that no minifier heuristic
would treat it as a special "huge string literal" worth dropping.

- cli/scripts/build-binary.ts: embedTreeSitterWasmAsChunks() writes the
  full array, returns sample chunks (start/middle/end) for the post-
  build verification scan to look for in the compiled binary. Restores
  the empty stub eagerly + via process.on('exit').
- cli/src/pre-init/tree-sitter-wasm-bytes.ts: re-introduced as a stub
  exporting an empty readonly string[]. Dev-mode and unit tests see
  the empty stub; production builds get the real chunks written in by
  build-binary.ts.
- cli/src/pre-init/tree-sitter-wasm.ts: import the chunks, .join(''),
  Buffer.from(_, 'base64'), publish on globalThis. The if() guard
  remains because dev mode legitimately has zero chunks.

Verified locally: build embeds 268 chunks, post-build verifies 3 sample
chunks at distinct offsets in the compiled binary, --smoke-tree-sitter
exits 0 with "tree-sitter smoke ok (wasmBinary, 205488 bytes)", full
smoke passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
J
James Grugett committed
3ad502b0e1677f4dc12afae8a4f99c3ddbaeedcd
Parent: e505cc7