Making large AI models cheaper, faster and more accessible
[Inference]Add CUDA KVCache Kernel (#5406)
* add cuda KVCache kernel * annotation benchmark_kvcache_copy * add use cuda * fix import path * move benchmark scripts to example/ * rm benchmark codes in test_kv_cache_memcpy.py * rm redundancy codes * rm redundancy codes * pr was modified according to the review
Y
yuehuayingxueluo committed
600881a8ea9b17c436ded922a9d4e3d5969acd87
Parent: 1906118
Committed by GitHub <noreply@github.com>
on 2/28/2024, 6:36:50 AM