TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration
Quick setup
Get started by creating a new file or uploading an existing file. We recommend every repository include a README, LICENSE, and .gitignore.
https://gitmorph.com/0xSero/turboquant.git CREATE A NEW REPOSITORY ON THE COMMAND LINE
touch README.md git init git checkout -b main git add README.md git commit -m "first commit" git remote add origin https://gitmorph.com/0xSero/turboquant.git git push -u origin main
PUSH AN EXISTING REPOSITORY
git remote add origin https://gitmorph.com/0xSero/turboquant.git
git push -u origin main