Making large AI models cheaper, faster and more accessible
add paged-attetionv2: support seq length split across thread block (#5707)
S
Steve Luo committed
7806842f2dbb4b6d6e74014efc7db5be8ccf0bbd
Parent: 18d67d0
Committed by GitHub <noreply@github.com>
on 5/14/2024, 4:46:54 AM