Making large AI models cheaper, faster and more accessible
[npu] support triangle attention for llama (#5130)
* update fused attn * update spda * tri attn * update triangle * import * fix * fix
X
Xuanlei Zhao committed
d6df19bae7cdb9e116c1f218a4465855623c80b1
Parent: f4e72c9
Committed by GitHub <noreply@github.com>
on 11/30/2023, 6:21:30 AM