COMMITS
May 13, 2026
H
Fix typo in README.md for mixed precision support
Haicheng Wu committed
H
Fix typo in CHANGELOG.md for mixed precision
Haicheng Wu committed
O
fix for thor (#3224)
Observer007 committed
May 12, 2026
H
update to 4.5 (#3228)
Haicheng Wu committed
E
Add Snake activation functor for EVT (#3184)
Emre Albayrak committed
May 11, 2026
T
[CuTeDSL] Fix loop carried target scope (#3200)
TungtungQia committed
May 7, 2026
Q
[CuTeDSL] Update atomic_max_float32 to atomic_fmax in blockscaled GEMM example (#3206)
questa-quan-wang committed
May 6, 2026
J
v4.5 tag update (#3202)
Junkai-Wu committed
April 25, 2026
J
[Hopper CuTeDSL] Add FP8 GEMM with 2xAcc (#3149)
Johnsonms committed
B
fix: Add missing kElementsPerAccess division in RegularTileIterator store (#3049)
Blake Ledden committed
April 24, 2026
April 21, 2026
D
Small Tile N BlockScaled GEMM + Grouped GEMM (#3176)
dePaul Miller committed
April 17, 2026
N
Add `absf` and `floor` to `cute.math` (#3156)
Nandor Licker committed
N
Add support for empty dataclass arguments (#3152)
Nandor Licker committed
April 9, 2026
L
Update blackwell tutorial to be compatible with 4.5-dev version (#3130)
Longsheng Du committed
April 8, 2026
B
Update the release note for 4.5 dev (#3154)
brandonsun committed
April 7, 2026
J
v4.5 dev update. (#3153)
Junkai-Wu committed
April 2, 2026
K
PR update (#3103)
Katja Sirazitdinova committed
March 30, 2026
March 24, 2026
D
Merge pull request #3126 from keithzzzzz/main
drazi committed
Z
[CLI] Fix tutorial issues
Zheng Linfeng committed
March 18, 2026
J
[Hopper CuTeDSL] Add grouped GEMM persistent kernel and tests (#3091)
Johnsonms committed
March 17, 2026
J
v4.4.2 update. (#3104)
Junkai-Wu committed
L
[CLI] add cutedsl fp16 gemm tutorial from 2 to 6 (#3106)
Linfeng Zheng committed
March 12, 2026
B
docs: Fix float16 documentation in elementwise_add notebook (#2949) (#3047)
Blake Ledden committed
March 7, 2026
D
Support for Group GEMM in CUTLASS Profiler for Geforce and Spark (#3092)
dePaul Miller committed
March 5, 2026
J
[fix] Boolean.__dsl_and__ emits arith.andi directly for i1 operands (#3087)
Johnsonms committed
T
Fix finding cuDNN (#2890)
TLescoatTFX committed