COMMITS
/ src/llama-context.cpp March 31, 2026
L
fix: correct misspellings in code comments (#21217)
lainon1 committed
March 24, 2026
A
ggml-backend: re-enable graph reuse with pipeline parallelism (#20927)
Aman Gupta committed
March 21, 2026
T
context : use n_embd_out for pooled embedding extraction (#20840)
Tom Hillbrunner committed
March 20, 2026
R
context: zero output buffer on allocation (#20781)
Ruikai Peng committed
March 18, 2026
A
context : fix graph not resetting when control vector changes (#20381)
Andreas Obersteiner committed
March 13, 2026
Z
llama : fix pooling assertion crash in chunked GDN detection path (#20468)
ZeroV0LT committed
March 12, 2026
G
llama : disable graph reuse with pipeline parallelism (#20463)
Georgi Gerganov committed
R
March 11, 2026
G
llama : enable chunked fused GDN path (#20340)
Georgi Gerganov committed
March 9, 2026
X
llama: dynamic head_dim and n_rot for SWA (#20301)
Xuan-Son Nguyen committed
March 8, 2026
J
llama: end-to-end tests (#19802)
Johannes Gäßler committed
March 7, 2026
A
ggml: add GATED_DELTA_NET op (#19504)
Aman Gupta committed
March 6, 2026
T
context: ignore zero scale LoRAs when checking sameness (#20166)
Tim Neumann committed
March 5, 2026
M
chore : correct typos [no ci] (#20041)
Marcel Petrick committed
February 23, 2026
D
llama : remove write/read of output ids/logits/embeddings (#18862)
Daniel Bevenius committed
February 19, 2026
D
llama : use output_resolve_row() in get_logits_ith/get_embeddings_ith (#19663)
Daniel Bevenius committed
R
model : full modern bert support (#18330)
Ryan Mangeno committed
February 16, 2026
G
graph : fix KQ mask, lora, cvec reuse checks (#19644)
Georgi Gerganov committed
February 15, 2026
G
context : fix output reorder with backend sampling (#19638)
Georgi Gerganov committed
February 14, 2026
A
llama : update LoRA API. + fix excessive graph reserves (#19280)
agent-enemy-2 committed
February 11, 2026
D
llama : refactor sampling_info to use buffer_view template (#19368)
Daniel Bevenius committed
February 10, 2026
J
models : support qwen3.5 series (#19468)
JJJYmmm committed
February 9, 2026
G
revert : "[Model] Qwen3.5 dense and MoE support (no vision) (#19435)" (#19453)
Georgi Gerganov committed
February 8, 2026
P
[Model] Qwen3.5 dense and MoE support (no vision) (#19435)
Piotr Wilkin (ilintar) committed
February 6, 2026
Y
Kimi-Linear support (backend agnostic + MLA KV cache) (#18755)
ymcki committed
February 3, 2026
G
sampling : delegate input allocation to the scheduler (#19266)
Georgi Gerganov committed
February 2, 2026
G
metal : support virtual devices (#18919)
Georgi Gerganov committed
January 28, 2026
D
sampling : remove sampling branching in output_reserve (#18811)
Daniel Bevenius committed
January 26, 2026
G
graph : fix nkvo offload with FA (#19105)
Georgi Gerganov committed
January 25, 2026
G
kv-cache : support V-less cache (#19067)
Georgi Gerganov committed
G
completion : fix prompt cache for recurrent models (#19045)
Georgi Gerganov committed
January 23, 2026
G
graph : utilize `ggml_build_forward_select()` to avoid reallocations (#18898)
Georgi Gerganov committed
January 15, 2026
G
context : do not reserve scheduler for warmups (#18867)
Georgi Gerganov committed
G
context : reserve new scheduler when graph topology changes (#18547)
Georgi Gerganov committed
X
lora: make sure model keep track of associated adapters (#18490)
Xuan-Son Nguyen committed
January 5, 2026
T
model : add LFM2-ColBert-350M (#18607)
Tarek Dakhran committed
January 4, 2026
D
sampling : add support for backend sampling (#17004)
Daniel Bevenius committed
January 3, 2026
G
context : fix reserve token padding to n_seqs (#18536)
Georgi Gerganov committed
December 30, 2025
X
lora: count lora nodes in graph_max_nodes (#18469)
Xuan-Son Nguyen committed
December 27, 2025
J
llama: fix magic number of 999 for GPU layers (#18266)
Johannes Gäßler committed
December 22, 2025
J
tool/ex/tests: consistently free ctx, then model (#18168)
Johannes Gäßler committed
December 15, 2025
J
December 14, 2025
G
models : fix YaRN regression + consolidate logic (#18006)
Georgi Gerganov committed
December 13, 2025
J
llama_context: synchronize before reallocating output buffer (#17974)
Jeff Bolz committed
December 10, 2025
G
ggml : remove GGML_KQ_MASK_PAD constant (#17910)
Georgi Gerganov committed
December 8, 2025
P
Make graph_max_nodes vary by ubatch size (#17794)
Piotr Wilkin (ilintar) committed
November 28, 2025
D
P
model : Qwen3 Next (#16095)
Piotr Wilkin (ilintar) committed
November 24, 2025
D
llama : skip output reordering for single token batches (#17466)
Daniel Bevenius committed
November 7, 2025
S
hparams : add n_embd_inp() to support extended embed (#16928)
Sigbjørn Skjæret committed