COMMITS
/ tools/server/server-task.cpp March 27, 2026
A
common : inhibit lazy grammar sampler while reasoning is active (#20970)
Aldehir Rojas committed
March 20, 2026
P
common/parser: fix nasty bug causing subtle corruption of generation prompt (#20825)
Piotr Wilkin (ilintar) committed
March 19, 2026
R
server: Add cached_tokens info to oaicompat responses (#19361)
Ryan Goulden committed
P
common/parser: add proper reasoning tag prefill reading (#20424)
Piotr Wilkin (ilintar) committed
March 11, 2026
P
common/parser: handle reasoning budget (#20297)
Piotr Wilkin (ilintar) committed
March 8, 2026
D
server : correct index on finish in OAI completion streams (#20226)
decahedron1 committed
March 6, 2026
P
Autoparser - complete refactoring of parser architecture (#18675)
Piotr Wilkin (ilintar) committed
February 25, 2026
G
server : enable multi-modal prompt caching (#19877)
Georgi Gerganov committed
February 24, 2026
R
server : support max_completion_tokens request property (#19831)
Radoslav Gerganov committed
February 9, 2026
S
spec : remove check rate (#19377)
Sascha Rogmann committed
January 28, 2026
S
spec : add self‑speculative decoding (no draft model required) + refactor (#18471)
Sascha Rogmann committed
January 22, 2026
X
server : support preserving reasoning_content in assistant message (#18994)
Xuan-Son Nguyen committed
손
server: Reorder methods in `server-task.cpp` (#19016)
손희준 committed
January 21, 2026
손
server: /v1/responses (partial) (#18486)
손희준 committed
January 20, 2026
X
cli : fix reasoning responses in CLI (#18961)
Xuan-Son Nguyen committed
January 15, 2026
D
llama : add adaptive-p sampler (#17927)
ddh0 committed
January 12, 2026
R
server : add arg for disabling prompt caching (#18776)
Radoslav Gerganov committed
January 6, 2026
January 4, 2026
D
sampling : add support for backend sampling (#17004)
Daniel Bevenius committed
December 22, 2025
X
server: prevent data race from HTTP threads (#18263)
Xuan-Son Nguyen committed
X
server: fix data race in to_json_anthropic (#18283)
Xuan-Son Nguyen committed
December 8, 2025
G
server : make cache_reuse configurable per request (#17858)
Georgi Gerganov committed
December 6, 2025
X
server: support multiple generations from one prompt (OAI "n" option) (#17775)
Xuan-Son Nguyen committed
December 4, 2025
X
server: move msg diffs tracking to HTTP thread (#17740)
Xuan-Son Nguyen committed
December 3, 2025
A
common : introduce composable PEG parser combinators for chat parsing (#17136)
Aldehir Rojas committed
December 2, 2025
X
server: remove default "gpt-3.5-turbo" model name (#17668)
Xuan-Son Nguyen committed
November 28, 2025
F
server : add Anthropic Messages API support (#17570)
Fredrik Hultin committed
November 24, 2025
X
server: split server.cpp code into server/common/task/queue (#17362)
Xuan-Son Nguyen committed