Commits: include/llama.h - ggml-org/llama.cpp

ggml-org / llama.cpp UNCLAIMED

LLM inference in C/C++

0 0 58 C++

COMMITS

/ include/llama.h

b8522

March 25, 2026

llama: fix llama-model-saver (#20503)

Johannes Gäßler committed 2mo ago

36dafba

March 18, 2026

llama : re-enable manual LoRA adapter free (#19983)

Pop Flamingo committed 3mo ago

312cf03

March 11, 2026

ggml : add NVFP4 quantization type support (#19769)

Richard Davison committed 3mo ago

5eae9cb

March 8, 2026

llama: end-to-end tests (#19802)

Johannes Gäßler committed 3mo ago

a976ff0

March 5, 2026

chore : correct typos [no ci] (#20041)

Marcel Petrick committed 3mo ago

92f7da0

February 20, 2026

quantize : add --dry-run option (#19526)

ddh0 committed 3mo ago

492bc31

February 14, 2026

llama : update LoRA API. + fix excessive graph reserves (#19280)

agent-enemy-2 committed 4mo ago

2d8015e

February 12, 2026

llama : update outdated comment in llama.h (#19428)

Christian Schmitz committed 4mo ago

f488429

February 11, 2026

llama : correct typos 'occured' and 'occurences' (#19414)

thecaptain789 committed 4mo ago

8ee538c

January 28, 2026

llama : disable Direct IO by default (#19109)

Georgi Gerganov committed 4mo ago

c5c64f7

January 25, 2026

llama: fix integer type consistency in split helpers (#18894)

Jakkala Mahesh committed 4mo ago

24bc238

January 24, 2026

llama-fit-params: keep explicit --ctx-size 0 (#19070)

Johannes Gäßler committed 4mo ago

e9fd8dc

January 15, 2026

llama : add adaptive-p sampler (#17927)

ddh0 committed 5mo ago

13f1e4a

context : reserve new scheduler when graph topology changes (#18547)

Georgi Gerganov committed 5mo ago

39173bc

lora: make sure model keep track of associated adapters (#18490)

Xuan-Son Nguyen committed 5mo ago

a7e6ddb

January 9, 2026

server : use different seeds for child completions (#18700)

Georgi Gerganov committed 5mo ago

f5f8812

January 8, 2026

llama-fit-params: free memory target per device (#18679)

Johannes Gäßler committed 5mo ago

64848de

llama : add `use_direct_io` flag for model loading (#18166)

Julius Tischbein committed 5mo ago

2038101

January 5, 2026

model : add LFM2-ColBert-350M (#18607)

Tarek Dakhran committed 5mo ago

73d284a

January 4, 2026

sampling : add support for backend sampling (#17004)

Daniel Bevenius committed 5mo ago

d3dce4e

December 30, 2025

lora: count lora nodes in graph_max_nodes (#18469)

Xuan-Son Nguyen committed 5mo ago

cd78e57

December 27, 2025

llama: fix magic number of 999 for GPU layers (#18266)

Johannes Gäßler committed 5mo ago

026d2ad

llama_fit_params: return enum for fail vs. error (#18374)

Johannes Gäßler committed 5mo ago

a52dc60

December 15, 2025

llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653)

Johannes Gäßler committed 6mo ago

b1f3a6e

November 25, 2025

llama: introduce support for model-embedded sampling parameters (#17120)

Aaron Teo committed 6mo ago

877566d

November 7, 2025

hparams : add n_embd_inp() to support extended embed (#16928)

Sigbjørn Skjæret committed 7mo ago

9008027

kv-cache : pad the cache size to 256 for performance (#17046)

Georgi Gerganov committed 7mo ago

16bcc12

November 2, 2025

server : support unified cache across slots (#16736)

Georgi Gerganov committed 7mo ago

cd5e3b5

docs: remove llama_sampler_accept reference in sampling sample usage (#16920)

Adrian Lundberg committed 7mo ago

76af40a

October 30, 2025

model: add support for qwen3vl series (#16780)

JJJYmmm committed 7mo ago

d261223

October 6, 2025

llama : add --no-host to disable host buffers (#16310)

Gadflyii committed 8mo ago

3df2244

October 3, 2025

server : context checkpointing for hybrid and recurrent models (#16382)

ddh0 committed 8mo ago

f6dcda3

September 24, 2025

llama: print memory breakdown on exit (#15860)

Johannes Gäßler committed 8mo ago

e789095

September 5, 2025

aLoRA Support (#15327)

Gabe Goodhart committed 9mo ago

fd62188

August 31, 2025

sampling : optimize samplers by reusing bucket sort (#15665)

Georgi Gerganov committed 9mo ago

e92d53b

August 30, 2025

llama: use FA + max. GPU layers by default (#15434)

Johannes Gäßler committed 9mo ago

e81b8e4

August 28, 2025

model : jina-embeddings-v3 support (#13693)

Sigbjørn Skjæret committed 9mo ago

84ab83c

August 22, 2025

llama : remove KV cache defragmentation logic (#15473)

Georgi Gerganov committed 9mo ago

9ebebef

August 21, 2025

llama : remove deprecated llama_kv_self API (#15472)

Georgi Gerganov committed 9mo ago

cd36b5e

kv-cache : drop the "unified" prefix (#15467)

Georgi Gerganov committed 9mo ago

715a6db

August 14, 2025

server : add SWA checkpoints (#15293)

Georgi Gerganov committed 10mo ago

d32e03f

finetune: SGD optimizer, more CLI args (#13873)

Jonathan Graehl committed 10mo ago

5cdb27e

August 5, 2025

llama : add gpt-oss (#15091)

Georgi Gerganov committed 10mo ago

fd1234c

July 31, 2025

llama : allow other bufts when overriding to CPU, add --no-repack option (#14990)

Diego Devesa committed 10mo ago

d6818d0

Add LLaDA 8b Diffusion model (#14771)

Aman Gupta committed 10mo ago

8a4a856

July 24, 2025

context : perform output reorder lazily upon access after sync (#14853)

Georgi Gerganov committed 10mo ago

e4868d1

July 17, 2025

llama : reuse compute graphs (#14482)

Georgi Gerganov committed 11mo ago

01612b7

July 16, 2025

llama : add high-throughput mode (#14363)

Georgi Gerganov committed 11mo ago

225e7a1

Support diffusion models: Add Dream 7B (#14644)

Aman Gupta committed 11mo ago

ab14019

llama: add LLAMA_API to deprecated llama_kv_self_seq_div (#14708)

Min-Hua committed 11mo ago

79e0b68