CUDA: use tensor cores for MMQ (#7676)
* CUDA: int8 tensor cores for MMQ (legacy quants) * fix out-of-bounds writes * __builtin_assume -> GGML_CUDA_ASSUME * fix writeback returning too early
J
Johannes Gäßler committed
1f0dabda8d5c131f9d4632aa41de74317cdd61fb
Parent: af4ae50
Committed by GitHub <noreply@github.com>
on 6/10/2024, 9:45:13 AM