COMMITS
/ tests/test-backend-ops.cpp May 30, 2026
G
metal : restore im2col implementation for large kernels (#23901)
Georgi Gerganov committed
May 29, 2026
R
ggml-webgpu: add q4_0/q8_0 SET_ROWS (#23760)
Reese Levine committed
May 28, 2026
J
vulkan: fast path for walsh-hadamard transform (#23687)
Jeff Bolz committed
May 26, 2026
J
tests: test-backend-ops -j <N> to run tests in parallel (#23637)
Jeff Bolz committed
May 25, 2026
A
CUDA: add fast walsh-hadamard transform (#23615)
Aman Gupta committed
May 21, 2026
G
metal : optimize concat kernel and fix set kernel threads (#23411)
Georgi Gerganov committed
T
hexagon: ssm-conv fix for large prompts (#23307)
Todor Boinovski committed
May 17, 2026
J
vulkan: Support unaligned tensors for ROPE (#22637)
Jeff Bolz committed
May 16, 2026
A
llama + spec: MTP Support (#22673)
Aman Gupta committed
May 15, 2026
P
May 14, 2026
R
ggml-webgpu: Enable NVIDIA self-hosted CI (#22976)
Reese Levine committed
May 11, 2026
P
Ggml/cuda snake fusion hardening (#22912)
Pascal committed
May 9, 2026
A
Add flash attention MMA / Tiles to support MiMo-V2.5 (#22812)
AesSedai committed
May 8, 2026
P
cuda: fuse snake activation (mul, sin, sqr, mul, add) (#22667)
Pascal committed
May 7, 2026
L
CUDA: batch out_prod inner loop with cublasSgemmStridedBatched (#22651)
leonardHONG committed
H
tests: add long-sequence cases and fix inputs for gated_delta_net (#22794)
HaoJun ZHANG committed
May 5, 2026
May 1, 2026
J
vulkan: Support asymmetric FA in coopmat2 path (#21753)
Jeff Bolz committed
April 29, 2026
A
CUDA: fuse SSM_CONV + ADD(bias) + SILU (#22478)
Anav Prasad committed
April 28, 2026
M
ggml-cuda: Repost of 21896: Blackwell native NVFP4 support (#22196)
Michael Wand committed
R
ggml-webgpu: fix buffer aliasing for ssm_scan and refactor aliasing logic (#22456)
Reese Levine committed
April 23, 2026
A
CUDA: fuse relu + sqr (#22249)
Anav Prasad committed
April 14, 2026
S
metal : add XIELU unary op (#20802)
Seyoung Jeong committed
April 13, 2026
R
vulkan: Flash Attention DP4A shader for quantized KV cache (#20797)
Ruben Ortlam committed
O
CUDA: Limit DeviceSegmentedSort to immediate mode (#21718)
Oliver Simons committed
April 10, 2026
J
vulkan: Support Q1_0 (#21539)
Jeff Bolz committed
April 8, 2026
P
metal: Q1_0 backend (#21528)
Pasha Khosravi committed
April 7, 2026
G
ggml : deprecate GGML_OP_ADD1 (#21363)
Georgi Gerganov committed
March 30, 2026
O
CUDA : Fix CUB's argsort when nrows % block_size == 0 CCCL < 3.1 (#21181)
Oliver Simons committed
March 26, 2026
M
ggml-cuda: Add NVFP4 dp4a kernel (#20644)
Michael Wand committed
Y
CUDA & CPU: support F32 kernel type for `CONV_TRANSPOSE_2D` (#17094)
Yihao Wang committed
March 24, 2026
G
metal : add FA instantiations for HSK=512, HSV=512 (#20902)
Georgi Gerganov committed
March 14, 2026
G
metal : add FA specialization for HSK = 320, HSV = 256 (#20549)
Georgi Gerganov committed
March 12, 2026
R
P
vulkan: add GATED_DELTA_NET op support (#20334)
ProgenyAlpha committed
J
vulkan: fix l2_norm epsilon handling (#20350)
Jeff Bolz committed
March 11, 2026
G
llama : enable chunked fused GDN path (#20340)
Georgi Gerganov committed
R
ggml : add NVFP4 quantization type support (#19769)
Richard Davison committed
March 7, 2026
A
ggml: add GATED_DELTA_NET op (#19504)
Aman Gupta committed
March 6, 2026
P
Autoparser - complete refactoring of parser architecture (#18675)
Piotr Wilkin (ilintar) committed
A
CUDA: use shared mem for ssm_conv (#20128)
Aman Gupta committed
March 5, 2026
M
chore : correct typos [no ci] (#20041)
Marcel Petrick committed
March 2, 2026
M
ggml-webgpu: Support non-contiguous `src0` and overlapping `src0/src1` in binary ops (#19850)
Masashi Yoshimura committed
February 20, 2026
J
test: mul_mat tests with huge batch size (#19519)
Jeff Bolz committed
February 15, 2026
G
ggml : avoid UB in gemm ukernel (#19642)
Georgi Gerganov committed
February 14, 2026
J
vulkan: support L2_NORM with contiguous rows (#19604)
Jeff Bolz committed
February 13, 2026
Y
fix vulkan ggml_acc only works in 3d but not 4d (#19426)
ymcki committed
G
metal : support GGML_OP_SET (#19548)
Georgi Gerganov committed
February 12, 2026
G
metal : update sum_rows kernel to support float4 (#19524)
Georgi Gerganov committed
February 11, 2026
G
ggml : unary ops support non-cont src0 + metal F16 unary ops (#19511)
Georgi Gerganov committed