Fix int16 overflow in l2_sqr_int8_neon SIMD distance
vmulq_s16(diff, diff) produced int16 results, but diff can be up to 255 for int8 vectors (-128 vs 127), and 255^2 = 65025 overflows int16 (max 32767). This caused NaN/wrong results for int8 vectors with large differences. Fix: use vmull_s16 (widening multiply) to produce int32 results directly, avoiding the intermediate int16 overflow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
A
Alex Garcia committed
7de925be70b9b2c3cc6e17ffa65eb5e688f12e67
Parent: 4bee883