Commits: src/llama-context.cpp - ggml-org/llama.cpp

ggml-org / llama.cpp UNCLAIMED

LLM inference in C/C++

0 0 0 C++

COMMITS

/ src/llama-context.cpp

b8658

March 31, 2026

fix: correct misspellings in code comments (#21217)

lainon1 committed 18d ago

0b6ff47

March 24, 2026

ggml-backend: re-enable graph reuse with pipeline parallelism (#20927)

Aman Gupta committed 25d ago

3fc6f1a

March 21, 2026

context : use n_embd_out for pooled embedding extraction (#20840)

Tom Hillbrunner committed 28d ago

212f452

March 20, 2026

context: zero output buffer on allocation (#20781)

Ruikai Peng committed 29d ago

dc65924

March 18, 2026

context : fix graph not resetting when control vector changes (#20381)

Andreas Obersteiner committed 1mo ago

a69d54f

March 13, 2026

llama : fix pooling assertion crash in chunked GDN detection path (#20468)

ZeroV0LT committed 1mo ago

f17b3be

March 12, 2026

llama : disable graph reuse with pipeline parallelism (#20463)

Georgi Gerganov committed 1mo ago

57819b8

test-backend-ops: allow loading tests from file and parsing model operators into file (#19896)

Ruben Ortlam committed 1mo ago

128142f

March 11, 2026

llama : enable chunked fused GDN path (#20340)

Georgi Gerganov committed 1mo ago

d28961d

March 9, 2026

llama: dynamic head_dim and n_rot for SWA (#20301)

Xuan-Son Nguyen committed 1mo ago

59db9a3

March 8, 2026

llama: end-to-end tests (#19802)

Johannes Gäßler committed 1mo ago

a976ff0

March 7, 2026

ggml: add GATED_DELTA_NET op (#19504)

Aman Gupta committed 1mo ago

c5a7788

March 6, 2026

context: ignore zero scale LoRAs when checking sameness (#20166)

Tim Neumann committed 1mo ago

388baab

March 5, 2026

chore : correct typos [no ci] (#20041)

Marcel Petrick committed 1mo ago

92f7da0

February 23, 2026

llama : remove write/read of output ids/logits/embeddings (#18862)

Daniel Bevenius committed 1mo ago

2b6dfe8

February 19, 2026

llama : use output_resolve_row() in get_logits_ith/get_embeddings_ith (#19663)

Daniel Bevenius committed 1mo ago

eacb4b6

model : full modern bert support (#18330)

Ryan Mangeno committed 1mo ago

c0d0430

February 16, 2026

graph : fix KQ mask, lora, cvec reuse checks (#19644)

Georgi Gerganov committed 2mo ago

d5dfc33

February 15, 2026

context : fix output reorder with backend sampling (#19638)

Georgi Gerganov committed 2mo ago

341bc7d

February 14, 2026

llama : update LoRA API. + fix excessive graph reserves (#19280)

agent-enemy-2 committed 2mo ago

2d8015e

February 11, 2026

llama : refactor sampling_info to use buffer_view template (#19368)

Daniel Bevenius committed 2mo ago

2cce9fd

February 10, 2026

models : support qwen3.5 series (#19468)

JJJYmmm committed 2mo ago

fc0fe40

February 9, 2026

revert : "[Model] Qwen3.5 dense and MoE support (no vision) (#19435)" (#19453)

Georgi Gerganov committed 2mo ago

972f323

February 8, 2026

[Model] Qwen3.5 dense and MoE support (no vision) (#19435)

Piotr Wilkin (ilintar) committed 2mo ago

39bf692

February 6, 2026

Kimi-Linear support (backend agnostic + MLA KV cache) (#18755)

ymcki committed 2mo ago

3688c4f

February 3, 2026

sampling : delegate input allocation to the scheduler (#19266)

Georgi Gerganov committed 2mo ago

faa1bc2

February 2, 2026

metal : support virtual devices (#18919)

Georgi Gerganov committed 2mo ago

6fdddb4

January 28, 2026

sampling : remove sampling branching in output_reserve (#18811)

Daniel Bevenius committed 2mo ago

eef375c

January 26, 2026

graph : fix nkvo offload with FA (#19105)

Georgi Gerganov committed 2mo ago

8f80d1b

January 25, 2026

kv-cache : support V-less cache (#19067)

Georgi Gerganov committed 2mo ago

d9c6ce4

completion : fix prompt cache for recurrent models (#19045)

Georgi Gerganov committed 2mo ago

080b161

January 23, 2026

graph : utilize `ggml_build_forward_select()` to avoid reallocations (#18898)

Georgi Gerganov committed 2mo ago

557515b

January 15, 2026

context : do not reserve scheduler for warmups (#18867)

Georgi Gerganov committed 3mo ago

be8e3d9

context : reserve new scheduler when graph topology changes (#18547)

Georgi Gerganov committed 3mo ago

39173bc

lora: make sure model keep track of associated adapters (#18490)

Xuan-Son Nguyen committed 3mo ago

a7e6ddb

January 5, 2026

model : add LFM2-ColBert-350M (#18607)

Tarek Dakhran committed 3mo ago

73d284a

January 4, 2026

sampling : add support for backend sampling (#17004)

Daniel Bevenius committed 3mo ago

d3dce4e

January 3, 2026

context : fix reserve token padding to n_seqs (#18536)

Georgi Gerganov committed 3mo ago

a554a1e

December 30, 2025

lora: count lora nodes in graph_max_nodes (#18469)

Xuan-Son Nguyen committed 3mo ago

cd78e57

December 27, 2025

llama: fix magic number of 999 for GPU layers (#18266)

Johannes Gäßler committed 3mo ago

026d2ad

December 22, 2025

tool/ex/tests: consistently free ctx, then model (#18168)

Johannes Gäßler committed 3mo ago

147a521

December 15, 2025

llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653)

Johannes Gäßler committed 4mo ago

b1f3a6e

December 14, 2025

models : fix YaRN regression + consolidate logic (#18006)

Georgi Gerganov committed 4mo ago

609a2d0

December 13, 2025

llama_context: synchronize before reallocating output buffer (#17974)

Jeff Bolz committed 4mo ago

5266379

December 10, 2025

ggml : remove GGML_KQ_MASK_PAD constant (#17910)

Georgi Gerganov committed 4mo ago

4dff236

December 8, 2025

Make graph_max_nodes vary by ubatch size (#17794)

Piotr Wilkin (ilintar) committed 4mo ago

e4e9c43

November 28, 2025

ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched (#17276)

Diego Devesa committed 4mo ago

e072b20

model : Qwen3 Next (#16095)

Piotr Wilkin (ilintar) committed 4mo ago

ff55414

November 24, 2025

llama : skip output reordering for single token batches (#17466)

Daniel Bevenius committed 4mo ago

134e694

November 7, 2025

hparams : add n_embd_inp() to support extended embed (#16928)

Sigbjørn Skjæret committed 5mo ago

9008027