COMMITS
/ tools/server/server-context.cpp April 4, 2026
D
server: Fix undefined timing measurement errors in server context (#21201)
Dan Hoffman committed
April 3, 2026
Y
server: save and clear idle slots on new task (`--clear-idle`) (#20993)
Yes You Can Have Your Own committed
March 28, 2026
G
server : fix processing of multiple back-to-back mtmd chunks (#21107)
Georgi Gerganov committed
March 22, 2026
X
server: allow router to report child instances sleep status (#20849)
Xuan-Son Nguyen committed
March 20, 2026
G
server : improve mtmd ctx checkpoints (#20726)
Georgi Gerganov committed
March 19, 2026
R
server: Add cached_tokens info to oaicompat responses (#19361)
Ryan Goulden committed
P
common/parser: add proper reasoning tag prefill reading (#20424)
Piotr Wilkin (ilintar) committed
March 17, 2026
P
common/parser: add `--skip-chat-parsing` to force a pure content parser. (#20289)
Piotr Wilkin (ilintar) committed
G
server : fix ctx checkpoint invalidation (#20671)
Georgi Gerganov committed
March 13, 2026
S
server: reset counter related to kill-switch on client error (#20513)
SoftwareRenderer committed
March 11, 2026
P
common/parser: handle reasoning budget (#20297)
Piotr Wilkin (ilintar) committed
March 10, 2026
G
server : make 2 checkpoints near the end of the prompt (#20288)
Georgi Gerganov committed
March 9, 2026
G
server : fix checkpoints n_tokens calculation (#20287)
Georgi Gerganov committed
G
server : warn swa-full is not supported for non-SWA models (#20291)
Georgi Gerganov committed
G
server : fix off-by-1 in server_tokens::size_up_to_pos() (#20279)
Georgi Gerganov committed
G
server : add kill switch when server is stuck (#20277)
Georgi Gerganov committed
March 8, 2026
G
server : do not create checkpoints right after mtmd chunks (#20232)
Georgi Gerganov committed
March 6, 2026
P
Checkpoint every n tokens: squash (#20087)
Piotr Wilkin (ilintar) committed
February 27, 2026
February 26, 2026
G
server : fix ctx checkpoint restore logic (#19924)
Georgi Gerganov committed
February 25, 2026
G
server : enable multi-modal prompt caching (#19877)
Georgi Gerganov committed
G
server : support multi-modal context checkpoints (#19849)
Georgi Gerganov committed
February 22, 2026
S
cli : provide model with text filename (#19783)
Sigbjørn Skjæret committed
February 18, 2026
February 9, 2026
February 8, 2026
G
server : improve context checkpoint logic (#19408)
Georgi Gerganov committed
February 6, 2026
G
common : add common_speculative_is_compat() (#19270)
Georgi Gerganov committed
January 30, 2026
G
server : wrap around the "id_slot" parameter (#19207)
Georgi Gerganov committed
G
spec : add ngram-mod (#19164)
Georgi Gerganov committed
January 28, 2026
S
spec : add self‑speculative decoding (no draft model required) + refactor (#18471)
Sascha Rogmann committed
January 22, 2026
X
server : support preserving reasoning_content in assistant message (#18994)
Xuan-Son Nguyen committed
January 21, 2026
손
server: /v1/responses (partial) (#18486)
손희준 committed
January 19, 2026
X
server : refactor oai_parser_opt, move it to server_chat_params (#18937)
Xuan-Son Nguyen committed
L
server: fix memory reservations in populate_token_probs (#18787)
Lennart Austenfeld committed
January 16, 2026
X
common : implement new jinja template engine (#18462)
Xuan-Son Nguyen committed
January 15, 2026
X
server: improve slots scheduling for n_cmpl (#18789)
Xuan-Son Nguyen committed
G
context : reserve new scheduler when graph topology changes (#18547)
Georgi Gerganov committed
January 9, 2026
X
server: fix n_cmpl not skipping processing prompt (#18663)
Xuan-Son Nguyen committed
G
server : fix timing of prompt/generation (#18713)
Georgi Gerganov committed
G
server : use different seeds for child completions (#18700)
Georgi Gerganov committed
January 5, 2026
T
model : add LFM2-ColBert-350M (#18607)
Tarek Dakhran committed
January 4, 2026
D
sampling : add support for backend sampling (#17004)
Daniel Bevenius committed
December 29, 2025
G
server : handle closed connection for tasks (#18459)
Georgi Gerganov committed
December 26, 2025
December 23, 2025
X
server: return_progress to also report 0% processing state (#18305)
Xuan-Son Nguyen committed
X
server: fix crash with model not having BOS/EOS (#18321)
Xuan-Son Nguyen committed
December 22, 2025
X
server: prevent data race from HTTP threads (#18263)
Xuan-Son Nguyen committed
December 21, 2025
X
server: add auto-sleep after N seconds of idle (#18228)
Xuan-Son Nguyen committed
December 20, 2025
O
server : [easy] fix per round speculative decode logging (#18211)
Oleksandr Kuvshynov committed
December 19, 2025
A
server: friendlier error msg when ctx < input (#18174)
Aman Gupta committed