AVX VNNI auto-activation for MSVC ; HAVE_VNNI256 path for IQ4_XS_R8 and Qx_0 R4 quants. (#1991)
* AVX VNNI auto-activation Enables auto-detect of AVX VNNI and its definition in the CMakeLists Detected by ik_llama.cpp. * IQ4_XS R8: Enable AVX-VNNI 256-bit path with MSVC compatibility Migrate mul_mat_iq4_xs_r8_q8_k_avx2() from HAVE_FANCY_SIMD to HAVE_VNNI256. Changes (6 guard sites + 8 intrinsic calls in iqk_gemm_kquants.cpp): - Replaced 3x #ifdef HAVE_FANCY_SIMD with #ifdef HAVE_VNNI256 - Replaced 3x #ifndef HAVE_FANCY_SIMD with #ifndef HAVE_VNNI256 - Replaced 8x raw _mm256_dpbusd_epi32 with ggml_mm256_dpbusd_epi32 (the ggml wrapper resolves to _mm256_dpbusd_avx_epi32 on MSVC via the iqk_config.h macro, which is the correct MSVC AVX-VNNI intrinsic available under /arch:AVX2; raw _mm256_dpbusd_epi32 does not exist in MSVC headers without AVX-512) Impact: - IQ4_XS_R8 matmul now uses VNNI256 on CPUs with AVX-VNNI but no AVX-512 (e.g. Intel Arrow Lake / Core Ultra 265K) - Previously limited to HAVE_FANCY_SIMD (full AVX-512) exclusively - This path is exercised when models are loaded with -rtr / --run-time-repack (in-memory repack) or when using --repack to create a permanent IQ4_XS_R8 file. Standard IQ4_XS does not auto-convert to IQ4_XS_R8 at load time. * Qx_0 R4 legacy quants: Enable VNNI256 path for AVX-VNNI CPUs with MSVC compatibility Three changes in iqk_gemm_legacy_quants.cpp: 1. DotHelper (line 23): Extend VNNI condition to include HAVE_VNNI256 (not just __AVX512VNNI__+VL) and use ggml_mm256_dpbusd_epi32 wrapper for MSVC compatibility. This fixes Q6_0 non-R4 path and all other quant types routed through UnsignedDot/SignedDot. 2. accum_q4_0_quants (line 994), mul_mat_q5_0_r4_q8_2_avx2 (lines 1202, 1223), mul_mat_q6_0_r4_q8_2_avx2 (lines 1375, 1394): Replace #ifdef HAVE_FANCY_SIMD / #ifndef HAVE_FANCY_SIMD with HAVE_VNNI256 (which correctly detects AVX-VNNI without requiring full AVX-512). Also replace raw _mm256_dpbusd_epi32 with ggml_mm256_dpbusd_epi32 wrapper. These paths were dead code on Arrow Lake (HAVE_FANCY_SIMD requires full AVX-512 which Arrow Lake lacks). Now they compile and use the hardware VNNI instruction (vpdpbusd) via __AVXVNNI__. Note: remaining HAVE_FANCY_SIMD guards in this file guard true AVX-512 paths (_mm512_* intrinsics) and are left unchanged. * Simplify def
N
Nexes the Elder committed
b3dfb7858cfcb9166e92f366e5af87f19ebc94be
Parent: 3b81f63
Committed by GitHub <noreply@github.com>
on 6/18/2026, 4:05:19 PM