perf(native): hoist cleanup regexes out of the per-file hot path (#1874)
* perf(native): hoist cleanup regexes out of the per-file hot path
The adapter cleanup pass runs once per emitted file, but it recompiled the
same regular expressions on every call. Compilation (not matching) dominated
the cost: `CleanupTransformedText` spent ~907us and 1398 allocations on a
~600-byte file body.
- Hoist the 11 static cleanup patterns to package-level vars so they compile
once instead of per file. This includes the pattern that was recompiled
inside a `ReplaceAllStringFunc` callback on every matched import line.
- Replace `aliasStillReferenced`'s dynamic `\balias\b` regex with an
allocation-free word-boundary scan; the alias is always a `\w+` token, so
the scan reproduces Go regexp's `\b` semantics exactly.
- Add a substring fast-reject plus a build-wide compile cache to
`runtimeImportAlreadyExists`, which previously compiled four
alias/module-parameterized patterns per alias.
- LlmApplicationProgrammer name validation: drop the per-call `^[0-9]` regex
for a byte comparison and hoist the identifier pattern.
Behavior is unchanged; the public-API cleanup tests still pass. A new
benchmark pins the win:
CleanupTransformedText 907us -> 75us 1398 -> 54 allocs (~12x)
CleanupTypeScriptTransformText 147us -> 60us 352 -> 33 allocs (~2.5x)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* perf(native): guard cleanup regex passes and scan arrows manually
Builds on the regex-hoisting change with two per-file/per-call-site wins
measured against the 2096-file test-typia-automated transform:
- Skip each cleanup `ReplaceAll` regex behind a cheap substring guard (the
literal a pattern requires). Most emitted files never contain markers like
`input is (`, `| (`, or `import type {`, so the full-text regex scan is
avoided entirely.
- Replace `parenthesizeSingleParameterArrows`' regex with a manual byte scan
reproducing `(^|[\s(=,:?])([A-Za-z_$][A-Za-z0-9_$]*) =>` exactly. It ran on
every printed call-site expression where the regex backtracker dominated;
the scan drops that pass from ~640ms to ~10ms of CPU on the workload.
Output is byte-identical across all 2096 files; cleanup unit tests still pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(native): make protobuf nested message order deterministic
protobufMessageProgrammer_write_hierarchy emitted nested messages by ranging
the `Children` map directly, and the per-level order tracker was rebuilt from
that same map each iteration, so the emitted `.proto` nested-message order
varied run to run (Go map iteration order). Non-deterministic transform output
breaks reproducible builds, output caching, and diff-based verification.
Track insertion order per hierarchy level in an explicit `Order` slice and emit
children through it, matching the deterministic insertion order the top level
already used (and the order the original TypeScript implementation produced).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* perf(native): memoize type full-name reconstruction per collection
The Set and Map iterators ran metadata_type_full_name on every explored type
just to test a `Set<`/`Map<` prefix, recomputing checker.TypeToString (and
recursing the whole union/intersection name) for the same apparent type twice
per type. The function is pure for a given type pointer, so cache its result on
the per-analysis MetadataCollection.
On the 2096-file test-typia-automated transform this nearly halves the
function's allocations (3.03M -> 1.62M objects) and cuts total allocation bytes
~11.5% (2003MB -> 1772MB), lowering GC time. Output is byte-identical and the
metadata/schema/name unit tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* perf(native): pre-size collection clone and cache full-name reconstruction
Two incremental allocation cuts on the metadata path, both verified
byte-identical across the 2096-file test-typia-automated transform:
- MetadataCollection.Clone (taken before every intersection exploration, so a
hot site) now pre-sizes its destination maps via maps.Clone/slices.Clone
instead of copying into the empty maps NewMetadataCollection allocated, which
forced rehashing as entries were re-inserted.
- getName memoizes metadataCollection_getFullName (checker.TypeToString,
recursing generics/unions) per type pointer. Only the pure full-name
reconstruction is cached; the duplicate-numbering bookkeeping still runs on
every call, so resolved names and their ordering are unchanged.
Total transform allocation drops a further ~3% (1.77GB -> 1.72GB); combined
with the per-collection type-name cache the metadata path now allocates ~14%
less than baseline. Go test suite stays green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> J
Jeongho Nam committed
ebfe8488eaad682d764676eb66d4ec3c562c2f96
Parent: 11c512d
Committed by GitHub <noreply@github.com>
on 6/1/2026, 6:50:13 AM