COMMITS
/ include/llama.h March 25, 2026
J
llama: fix llama-model-saver (#20503)
Johannes Gäßler committed
March 18, 2026
P
llama : re-enable manual LoRA adapter free (#19983)
Pop Flamingo committed
March 11, 2026
R
ggml : add NVFP4 quantization type support (#19769)
Richard Davison committed
March 8, 2026
J
llama: end-to-end tests (#19802)
Johannes Gäßler committed
March 5, 2026
M
chore : correct typos [no ci] (#20041)
Marcel Petrick committed
February 20, 2026
D
quantize : add --dry-run option (#19526)
ddh0 committed
February 14, 2026
A
llama : update LoRA API. + fix excessive graph reserves (#19280)
agent-enemy-2 committed
February 12, 2026
C
llama : update outdated comment in llama.h (#19428)
Christian Schmitz committed
February 11, 2026
T
llama : correct typos 'occured' and 'occurences' (#19414)
thecaptain789 committed
January 28, 2026
G
llama : disable Direct IO by default (#19109)
Georgi Gerganov committed
January 25, 2026
J
llama: fix integer type consistency in split helpers (#18894)
Jakkala Mahesh committed
January 24, 2026
J
llama-fit-params: keep explicit --ctx-size 0 (#19070)
Johannes Gäßler committed
January 15, 2026
D
llama : add adaptive-p sampler (#17927)
ddh0 committed
G
context : reserve new scheduler when graph topology changes (#18547)
Georgi Gerganov committed
X
lora: make sure model keep track of associated adapters (#18490)
Xuan-Son Nguyen committed
January 9, 2026
G
server : use different seeds for child completions (#18700)
Georgi Gerganov committed
January 8, 2026
J
llama-fit-params: free memory target per device (#18679)
Johannes Gäßler committed
J
llama : add `use_direct_io` flag for model loading (#18166)
Julius Tischbein committed
January 5, 2026
T
model : add LFM2-ColBert-350M (#18607)
Tarek Dakhran committed
January 4, 2026
D
sampling : add support for backend sampling (#17004)
Daniel Bevenius committed
December 30, 2025
X
lora: count lora nodes in graph_max_nodes (#18469)
Xuan-Son Nguyen committed
December 27, 2025
J
llama: fix magic number of 999 for GPU layers (#18266)
Johannes Gäßler committed
J
llama_fit_params: return enum for fail vs. error (#18374)
Johannes Gäßler committed
December 15, 2025
J
November 25, 2025
A
llama: introduce support for model-embedded sampling parameters (#17120)
Aaron Teo committed
November 7, 2025
S
hparams : add n_embd_inp() to support extended embed (#16928)
Sigbjørn Skjæret committed
G
kv-cache : pad the cache size to 256 for performance (#17046)
Georgi Gerganov committed
November 2, 2025
G
server : support unified cache across slots (#16736)
Georgi Gerganov committed
A
docs: remove llama_sampler_accept reference in sampling sample usage (#16920)
Adrian Lundberg committed
October 30, 2025
J
model: add support for qwen3vl series (#16780)
JJJYmmm committed
October 6, 2025
G
llama : add --no-host to disable host buffers (#16310)
Gadflyii committed
October 3, 2025
September 24, 2025
J
llama: print memory breakdown on exit (#15860)
Johannes Gäßler committed
September 5, 2025
G
aLoRA Support (#15327)
Gabe Goodhart committed
August 31, 2025
G
sampling : optimize samplers by reusing bucket sort (#15665)
Georgi Gerganov committed
August 30, 2025
J
llama: use FA + max. GPU layers by default (#15434)
Johannes Gäßler committed
August 28, 2025
S
model : jina-embeddings-v3 support (#13693)
Sigbjørn Skjæret committed
August 22, 2025
G
llama : remove KV cache defragmentation logic (#15473)
Georgi Gerganov committed
August 21, 2025
G
llama : remove deprecated llama_kv_self API (#15472)
Georgi Gerganov committed
G
kv-cache : drop the "unified" prefix (#15467)
Georgi Gerganov committed
August 14, 2025
G
server : add SWA checkpoints (#15293)
Georgi Gerganov committed
J
finetune: SGD optimizer, more CLI args (#13873)
Jonathan Graehl committed
August 5, 2025
G
llama : add gpt-oss (#15091)
Georgi Gerganov committed
July 31, 2025
D
llama : allow other bufts when overriding to CPU, add --no-repack option (#14990)
Diego Devesa committed
A
Add LLaDA 8b Diffusion model (#14771)
Aman Gupta committed
July 24, 2025
G
context : perform output reorder lazily upon access after sync (#14853)
Georgi Gerganov committed
July 17, 2025
G
llama : reuse compute graphs (#14482)
Georgi Gerganov committed
July 16, 2025
G
llama : add high-throughput mode (#14363)
Georgi Gerganov committed
A
Support diffusion models: Add Dream 7B (#14644)
Aman Gupta committed
M
llama: add LLAMA_API to deprecated llama_kv_self_seq_div (#14708)
Min-Hua committed