0 0 64 C

Fix int16 overflow in l2_sqr_int8_neon SIMD distance

vmulq_s16(diff, diff) produced int16 results, but diff can be up to
255 for int8 vectors (-128 vs 127), and 255^2 = 65025 overflows
int16 (max 32767). This caused NaN/wrong results for int8 vectors
with large differences.

Fix: use vmull_s16 (widening multiply) to produce int32 results
directly, avoiding the intermediate int16 overflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Alex Garcia committed 2mo ago

7de925be70b9b2c3cc6e17ffa65eb5e688f12e67

Parent: 4bee883