COMMITS
/ common/common.h March 28, 2026
S
cli : add /glob command (#21084)
Sigbjørn Skjæret committed
A
server : add custom socket options to disable SO_REUSEPORT (#21056)
Adrien Gallouët committed
March 27, 2026
X
server: add built-in tools backend support (#20898)
Xuan-Son Nguyen committed
March 19, 2026
P
common/parser: add proper reasoning tag prefill reading (#20424)
Piotr Wilkin (ilintar) committed
March 17, 2026
P
common/parser: add `--skip-chat-parsing` to force a pure content parser. (#20289)
Piotr Wilkin (ilintar) committed
March 12, 2026
R
March 11, 2026
P
common/parser: handle reasoning budget (#20297)
Piotr Wilkin (ilintar) committed
March 8, 2026
J
llama: end-to-end tests (#19802)
Johannes Gäßler committed
March 6, 2026
P
Checkpoint every n tokens: squash (#20087)
Piotr Wilkin (ilintar) committed
A
webui: Agentic Loop + MCP Client with support for Tools, Resources and Prompts (#18655)
Aleksander Grygier committed
March 5, 2026
M
chore : correct typos [no ci] (#20041)
Marcel Petrick committed
February 27, 2026
February 23, 2026
D
llama : remove write/read of output ids/logits/embeddings (#18862)
Daniel Bevenius committed
February 18, 2026
A
common : make small string helpers as inline functions (#19693)
Adrien Gallouët committed
February 16, 2026
I
common : inline functions (#18639)
Ivan Chikish committed
February 11, 2026
D
common : remove unused token util functions (#19506)
Daniel Bevenius committed
February 9, 2026
S
spec : remove check rate (#19377)
Sascha Rogmann committed
January 30, 2026
G
spec : add ngram-mod (#19164)
Georgi Gerganov committed
January 28, 2026
S
spec : add self‑speculative decoding (no draft model required) + refactor (#18471)
Sascha Rogmann committed
G
llama : disable Direct IO by default (#19109)
Georgi Gerganov committed
January 20, 2026
A
common, server : use the same User-Agent by default (#18957)
Adrien Gallouët committed
X
cli : fix reasoning responses in CLI (#18961)
Xuan-Son Nguyen committed
January 15, 2026
D
llama : add adaptive-p sampler (#17927)
ddh0 committed
January 12, 2026
R
server : add arg for disabling prompt caching (#18776)
Radoslav Gerganov committed
D
examples : add --kv-unified to batched example (#18774)
Daniel Bevenius committed
January 8, 2026
J
llama-fit-params: free memory target per device (#18679)
Johannes Gäßler committed
J
llama : add `use_direct_io` flag for model loading (#18166)
Julius Tischbein committed
January 7, 2026
D
examples : add debug utility/example (#18464)
Daniel Bevenius committed
January 4, 2026
D
sampling : add support for backend sampling (#17004)
Daniel Bevenius committed
December 27, 2025
J
llama: fix magic number of 999 for GPU layers (#18266)
Johannes Gäßler committed
December 21, 2025
X
server: add auto-sleep after N seconds of idle (#18228)
Xuan-Son Nguyen committed
December 17, 2025
P
server: (webui) add --webui-config (#18028)
Pascal committed
December 15, 2025
J
December 14, 2025
G
common : refactor common_sampler + grammar logic changes (#17937)
Georgi Gerganov committed
December 10, 2025
X
cli: enable jinja by default (#17911)
Xuan-Son Nguyen committed
P
server: add presets (config) when using multiple models (#17859)
Pascal committed
X
cli: new CLI experience (#17824)
Xuan-Son Nguyen committed
December 7, 2025
S
common : change --color to accept on/off/auto, default to auto (#17827)
Sigbjørn Skjæret committed
December 4, 2025
A
build : move _WIN32_WINNT definition to headers (#17736)
Adrien Gallouët committed
December 2, 2025
X
server: add --media-path for local media files (#17697)
Xuan-Son Nguyen committed
December 1, 2025
X
server: introduce API for serving / loading / unloading multiple models (#17470)
Xuan-Son Nguyen committed
X
common: improve verbosity level definitions (#17630)
Xuan-Son Nguyen committed
November 25, 2025
A
llama: introduce support for model-embedded sampling parameters (#17120)
Aaron Teo committed
November 20, 2025
G
common : more accurate sampling timing (#17382)
Georgi Gerganov committed
November 10, 2025
G
batched-bench : add "separate text gen" mode (#17103)
Georgi Gerganov committed
November 8, 2025
X
arg: add --cache-list argument to list cached models (#17073)
Xuan-Son Nguyen committed
November 5, 2025
G
server : do not default to multiple slots with speculative decoding (#17017)
Georgi Gerganov committed
November 3, 2025
X
mtmd: add --image-min/max-tokens (#16921)
Xuan-Son Nguyen committed
October 12, 2025
G
common : update presets (#16504)
Georgi Gerganov committed