COMMITS
April 15, 2026
D
Validate safetensors data offsets (#3364)
Dan Anderson committed
C
Fixes for CUDA CI (#3413)
Cheng committed
A
Jaccl refactor (#3412)
Angelos Katharopoulos committed
J
Update nanobind version to v2.12.0 (#3396)
jrp2014 committed
April 14, 2026
C
Add clear_streams API for cleanup before exit (#3395)
Cheng committed
April 11, 2026
C
Avoid joining threads on exit (#3388)
Cheng committed
April 10, 2026
C
Fix int16 overflow in SDPA NAX mask indexing for KV sequences > 32K (#3361)
Clydingus committed
April 9, 2026
C
Conjugate VJP and JVP support (#3386)
Cameron Churchwell committed
April 8, 2026
D
Fix test "test get streams" missing initialization (#3376)
Daniil Seredkin committed
C
[CUDA] Thread safety (#3367)
Cheng committed
April 7, 2026
S
Fix: Correct cross-attention query routing in Post-LN TransformerDecoderLayer (#3382)
Shantanu Suryawanshi committed
April 6, 2026
D
fix: fail build when Metal compiler header resolution fails (#3332)
Doğukan Veziroğlu committed
L
[CUDA] Add GatherQMM for quantized gather matmul (#3321)
Long Yixing committed
H
Fix CMake finding wrong Python during pip install (#3375)
Harrison Powers committed
April 3, 2026
A
Add a convenience for making local streams in python (#3355)
Angelos Katharopoulos committed
April 2, 2026
C
Add printoptions (#3333)
Christophe Prat committed
April 1, 2026
V
Use `metal` as the front-end for the metal linker (#3354)
Valentin Roussellet committed
A
Fix regression in array creation (#3353)
Angelos Katharopoulos committed
C
[CUDA] 3/5/6-bit quants for qmm_naive (#3352)
Cheng committed
C
Make CommandEncoder thread local (#3348)
Cheng committed
C
[CUDA] Fallback QMM (#3315)
Cheng committed
L
[Metal] Support sorting complex numbers (#3314)
Long Yixing committed
D
Add fftfreq, rfftfreq and scalar axes for fftshift/ifftshift (#3298)
declanhealy2 committed
A
Add vmap for BroadcastAxes (#3344)
Angelos Katharopoulos committed
March 31, 2026
C
Decouple CommandEncoder from Device (#3316)
Cheng committed
A
Fix use after move (#3343)
Angelos Katharopoulos committed
D
Bump actions/deploy-pages from 4 to 5 (#3334)
dependabot[bot] committed
March 30, 2026
C
Remove no longer needed const_cast (#3325)
Cheng committed
K
Fix np bfloat16 misinterpreted as complex (#3146)
Kellen Sun committed
March 26, 2026
L
[CUDA] Implement BlockMaskedMM (#3299)
Long Yixing committed