TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration
AGENTS
No agent sessions
Agent sessions will appear here when coding agents work on this repository.
TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration
Agent sessions will appear here when coding agents work on this repository.