COMMITS
April 29, 2026
P
Swap FlashMLA combine grid dimensions (#182)
Perkz Zheng committed
March 31, 2026
February 6, 2026
Z
Add CUDAGuard and device id assignment in sm100 dense fmha (#160)
Zeyu WANG committed
January 19, 2026
J
Add missing include<span>
Jiashi Li committed
January 16, 2026
S
Multiple updates and refactorings (#150)
Shengyu Liu committed
September 30, 2025
J
Update README
Jiashi Li committed
J
Code format
Jiashi Li committed
J
Fix error message
Jiashi Li committed
S
Update blog and README
Shengyu Liu committed
September 29, 2025
S
Rename deep dive blog
Shengyu Liu committed
S
Add Deep-Dive Blog for the New Sparse Decoding Kernel on Hopper (#100)
Shengyu Liu committed
S
Add Sparse Decoding Kernel and Sparse Prefill Kernel for Blackwell
Simon Mo committed
S
Merge pull request #98 from deepseek-ai/open-source-h
Shengyu Liu committed
S
Merge remote-tracking branch 'github/main' into open-source-h
Shengyu Liu committed
S
Fill in link to DSv3.2 paper
Shengyu Liu committed
September 24, 2025
S
Add a comment
Shengyu Liu committed
S
Reorganize files and add sparse prefill/decoding kernels on hopper
Shengyu Liu committed
September 22, 2025
Z
Refine handling for q/v sequence length equals zero. (#92)
zhang committed
August 27, 2025
Z
fix calc space bug (#91)
Zeyu WANG committed
August 25, 2025
L
Remove cudaMalloc and cudaFree in backward (#87)
Li Xiang committed
Z
Remove tma padding for fwd inputs (#85)
zhang committed
August 14, 2025
J
Fix accuracy issue in sum_OdO kernel
Jiashi Li committed
J
Drop support for CUDA <12.8
Jiashi Li committed
August 1, 2025
Z
Add more GPU architctures support (#76)
Zeyu WANG committed
April 29, 2025
L
update .gitignore
ljss committed
L
update to cutlass 3.9
ljss committed
April 28, 2025
L
Fix synchronization issues
ljss committed
April 23, 2025
S
Fix LaTeX render error (#74)
Shengyu Liu committed
ℍ
Minor fix to the docs to correct FlashAttention-3's paper link and typos (#73)
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 committed