SIGN IN SIGN UP
0xSero / turboquant UNCLAIMED

TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration

RELEASES

NEW RELEASE

No releases

Releases are snapshots of your project at specific points in time.

Create a release