TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration
PAGES (0)
Wiki is empty
This wiki doesn't have any pages yet. Create the Home page to get started.
TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration
This wiki doesn't have any pages yet. Create the Home page to get started.