Commits: tools/server/server-task.cpp - ggml-org/llama.cpp - Morph

SIGN IN SIGN UP

ggml-org / llama.cpp UNCLAIMED

LLM inference in C/C++

101214 0 0 C++

COMMITS

/ tools/server/server-task.cpp

master

March 27, 2026

A

common : inhibit lazy grammar sampler while reasoning is active (#20970)

Aldehir Rojas committed 7d ago

March 20, 2026

P

common/parser: fix nasty bug causing subtle corruption of generation prompt (#20825)

Piotr Wilkin (ilintar) committed 14d ago

March 19, 2026

R

server: Add cached_tokens info to oaicompat responses (#19361)

Ryan Goulden committed 15d ago

P

common/parser: add proper reasoning tag prefill reading (#20424)

Piotr Wilkin (ilintar) committed 15d ago

March 11, 2026

P

common/parser: handle reasoning budget (#20297)

Piotr Wilkin (ilintar) committed 24d ago

March 8, 2026

D

server : correct index on finish in OAI completion streams (#20226)

decahedron1 committed 27d ago

March 6, 2026

P

Autoparser - complete refactoring of parser architecture (#18675)

Piotr Wilkin (ilintar) committed 28d ago

February 25, 2026

G

server : enable multi-modal prompt caching (#19877)

Georgi Gerganov committed 1mo ago

February 24, 2026

R

server : support max_completion_tokens request property (#19831)

Radoslav Gerganov committed 1mo ago

February 9, 2026

S

spec : remove check rate (#19377)

Sascha Rogmann committed 1mo ago

January 28, 2026

S

spec : add self‑speculative decoding (no draft model required) + refactor (#18471)

Sascha Rogmann committed 2mo ago

January 22, 2026

X

server : support preserving reasoning_content in assistant message (#18994)

Xuan-Son Nguyen committed 2mo ago

손

server: Reorder methods in `server-task.cpp` (#19016)

손희준 committed 2mo ago

January 21, 2026

손

server: /v1/responses (partial) (#18486)

손희준 committed 2mo ago

January 20, 2026

X

cli : fix reasoning responses in CLI (#18961)

Xuan-Son Nguyen committed 2mo ago

January 15, 2026

D

llama : add adaptive-p sampler (#17927)

ddh0 committed 2mo ago

January 12, 2026

R

server : add arg for disabling prompt caching (#18776)

Radoslav Gerganov committed 2mo ago

January 6, 2026

R

server : add thinking content blocks to Anthropic Messages API (#18551)

R committed 2mo ago

January 4, 2026

D

sampling : add support for backend sampling (#17004)

Daniel Bevenius committed 2mo ago

December 22, 2025

X

server: prevent data race from HTTP threads (#18263)

Xuan-Son Nguyen committed 3mo ago

X

server: fix data race in to_json_anthropic (#18283)

Xuan-Son Nguyen committed 3mo ago

December 8, 2025

G

server : make cache_reuse configurable per request (#17858)

Georgi Gerganov committed 3mo ago

December 6, 2025

X

server: support multiple generations from one prompt (OAI "n" option) (#17775)

Xuan-Son Nguyen committed 3mo ago

December 4, 2025

X

server: move msg diffs tracking to HTTP thread (#17740)

Xuan-Son Nguyen committed 4mo ago

December 3, 2025

A

common : introduce composable PEG parser combinators for chat parsing (#17136)

Aldehir Rojas committed 4mo ago

December 2, 2025

X

server: remove default "gpt-3.5-turbo" model name (#17668)

Xuan-Son Nguyen committed 4mo ago

November 28, 2025

F

server : add Anthropic Messages API support (#17570)

Fredrik Hultin committed 4mo ago

November 24, 2025

X

server: split server.cpp code into server/common/task/queue (#17362)

Xuan-Son Nguyen committed 4mo ago