COMMITS
/ tests/test-backend-ops.cpp March 30, 2026
O
CUDA : Fix CUB's argsort when nrows % block_size == 0 CCCL < 3.1 (#21181)
Oliver Simons committed
March 26, 2026
M
ggml-cuda: Add NVFP4 dp4a kernel (#20644)
Michael Wand committed
Y
CUDA & CPU: support F32 kernel type for `CONV_TRANSPOSE_2D` (#17094)
Yihao Wang committed
March 24, 2026
G
metal : add FA instantiations for HSK=512, HSV=512 (#20902)
Georgi Gerganov committed
March 14, 2026
G
metal : add FA specialization for HSK = 320, HSV = 256 (#20549)
Georgi Gerganov committed
March 12, 2026
R
P
vulkan: add GATED_DELTA_NET op support (#20334)
ProgenyAlpha committed
J
vulkan: fix l2_norm epsilon handling (#20350)
Jeff Bolz committed
March 11, 2026
G
llama : enable chunked fused GDN path (#20340)
Georgi Gerganov committed
R
ggml : add NVFP4 quantization type support (#19769)
Richard Davison committed
March 7, 2026
A
ggml: add GATED_DELTA_NET op (#19504)
Aman Gupta committed
March 6, 2026
P
Autoparser - complete refactoring of parser architecture (#18675)
Piotr Wilkin (ilintar) committed
A
CUDA: use shared mem for ssm_conv (#20128)
Aman Gupta committed
March 5, 2026
M
chore : correct typos [no ci] (#20041)
Marcel Petrick committed
March 2, 2026
M
ggml-webgpu: Support non-contiguous `src0` and overlapping `src0/src1` in binary ops (#19850)
Masashi Yoshimura committed
February 20, 2026
J
test: mul_mat tests with huge batch size (#19519)
Jeff Bolz committed
February 15, 2026
G
ggml : avoid UB in gemm ukernel (#19642)
Georgi Gerganov committed
February 14, 2026
J
vulkan: support L2_NORM with contiguous rows (#19604)
Jeff Bolz committed
February 13, 2026
Y
fix vulkan ggml_acc only works in 3d but not 4d (#19426)
ymcki committed
G
metal : support GGML_OP_SET (#19548)
Georgi Gerganov committed
February 12, 2026
G
metal : update sum_rows kernel to support float4 (#19524)
Georgi Gerganov committed
February 11, 2026
G
ggml : unary ops support non-cont src0 + metal F16 unary ops (#19511)
Georgi Gerganov committed
G
ggml : extend bin bcast for permuted src1 (#19484)
Georgi Gerganov committed
G
metal : consolidate unary ops (#19490)
Georgi Gerganov committed
February 10, 2026
X
test: fix IMROPE perf test case (#19465)
Xuan-Son Nguyen committed
G
cuda : extend GGML_OP_PAD to work with non-cont src0 (#19429)
Georgi Gerganov committed
February 6, 2026
J
tests: reduce number of FA test permutations (#19381)
Jeff Bolz committed
February 5, 2026
J
vulkan: Preprocess FA mask to detect all-neg-inf and all-zero. (#19281)
Jeff Bolz committed
February 4, 2026
G
tests : add non-cont, inplace rope tests (#19296)
Georgi Gerganov committed
February 2, 2026
A
ggml-cpu: FA split across kv for faster TG (#19209)
Aman Gupta committed
January 30, 2026
G
tests : add GQA=20 FA test (#19095)
Georgi Gerganov committed
January 26, 2026
J
CUDA: fix padding of GQA to power of 2 in FA (#19115)
Johannes Gäßler committed
January 22, 2026
G
mla : make the V tensor a view of K (#18986)
Georgi Gerganov committed
January 21, 2026
J
vulkan: support flash attention GQA/split_k with small batches (#18938)
Jeff Bolz committed
January 16, 2026
T
ggml : extend ggml_pool_1d + metal (#16429)
Thore Koritzius committed
January 15, 2026
O
CUDA: Factor out and re-use `block_reduce` function (#18785)
Oliver Simons committed
January 12, 2026
J
vulkan: Use VK_EXT_shader_64bit_indexing to handle large mat_mul(_id) (#18678)
Jeff Bolz committed
January 10, 2026
A
test-backend-ops: fix mxfp4 tests on blackwell (#18736)
Aman Gupta committed
January 5, 2026
J
vulkan: fix topk_moe_sigmoid_norm_bias failures in GLM-4.6 (#18582)
Jeff Bolz committed
J
vulkan: handle quantize_q8_1 overflowing the max workgroup count (#18515)
Jeff Bolz committed
C
CANN: add operator fusion support for ADD + RMS_NORM (#17512)
Chenguang Li committed
January 4, 2026
D
sampling : add support for backend sampling (#17004)
Daniel Bevenius committed
January 1, 2026
J
vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron (#18295)
Jeff Bolz committed
December 26, 2025
J
vulkan: Support UPSCALE w/antialias (#18327)
Jeff Bolz committed
J
vulkan: handle rope with large number of rows (#18306)
Jeff Bolz committed
December 22, 2025
J
vulkan: Extend rope fusions to allow mrope (#18264)
Jeff Bolz committed
December 21, 2025
J
vulkan: fix im2col overflowing maxworkgroupcount (#18180)
Jeff Bolz committed
J
vulkan/cuda: fix topk_moe with exp_probs_b (#18071)
Jeff Bolz committed
December 20, 2025
J
tests: Avoid floating point precision false positives in SUM (#17471)
Jeff Bolz committed
J
test-backend-ops: improve msvc build time (#18209)
Jeff Bolz committed