SIGN IN SIGN UP

CUDA: use tensor cores for MMQ (#7676)

* CUDA: int8 tensor cores for MMQ (legacy quants)

* fix out-of-bounds writes

* __builtin_assume -> GGML_CUDA_ASSUME

* fix writeback returning too early
J
Johannes Gäßler committed
1f0dabda8d5c131f9d4632aa41de74317cdd61fb
Parent: af4ae50
Committed by GitHub <noreply@github.com> on 6/10/2024, 9:45:13 AM