COMMITS
/ backend/cpp/llama-cpp/grpc-server.cpp May 21, 2026
L
feat(llama-cpp): make server-side prompt cache work by default (#9925)
LocalAI [bot] committed
R
fix(llama-cpp): terminate tensor_buft_overrides with sentinel (#9919)
Richard Palethorpe committed
May 14, 2026
L
feat(llama-cpp): expose 12 missing common_params via options[] (#9814)
LocalAI [bot] committed
L
May 12, 2026
L
feat(llama-cpp): bump to `1ec7ba0c`, adapt grpc-server, expose new spec-decoding options (#9765)
LocalAI [bot] committed
April 30, 2026
E
feat(llama-cpp): bump to d775992 and adapt to spec params refactor (#9618)
Ettore Di Giacinto committed
April 25, 2026
E
feat(llama-cpp): expose split_mode option for multi-GPU placement (#9560)
Ettore Di Giacinto committed
April 23, 2026
E
fix(llama-cpp): include server-chat.cpp in grpc-server translation unit (#9511)
Ettore Di Giacinto committed
April 18, 2026
E
fix(vision): propagate mtmd media marker from backend via ModelMetadata (#9412)
Ettore Di Giacinto committed
April 14, 2026
E
feat: wire transcription for llama.cpp, add streaming support (#9353)
Ettore Di Giacinto committed
April 10, 2026
E
fix(streaming): skip chat deltas for role-init elements to prevent first token duplication (#9299)
Ettore Di Giacinto committed
April 9, 2026
E
chore(llama.cpp): bump to 'd12cc3d1ca6bba741cd77887ac9c9ee18c8415c7' (#9282)
Ettore Di Giacinto committed
April 6, 2026
E
fix(chat): do not retry if we had chatdeltas or tooldeltas from backend (#9244)
Ettore Di Giacinto committed
April 5, 2026
E
feat(llama.cpp): wire speculative decoding settings (#9238)
Ettore Di Giacinto committed
April 4, 2026
E
fix(reasoning): suppress partial tag tokens during autoparser warm-up
Ettore Di Giacinto committed
April 3, 2026
E
fix(llama.cpp): correctly parse grpc header for bearer token auth
Ettore Di Giacinto committed
March 29, 2026
E
feat: add distributed mode (#9124)
Ettore Di Giacinto committed
March 21, 2026
E
feat: inferencing default, automatic tool parsing fallback and wire min_p (#9092)
Ettore Di Giacinto committed
March 20, 2026
E
chore(deps): bump llama-cpp to 'a0bbcdd9b6b83eeeda6f1216088f42c33d464e38' (#9079)
Ettore Di Giacinto committed
March 12, 2026
R
fix(llama-cpp): Set enable_thinking in the correct place (#8973)
Richard Palethorpe committed
March 8, 2026
E
feat(functions): add peg-based parsing and allow backends to return tool calls directly (#8838)
Ettore Di Giacinto committed
March 5, 2026
E
feat: pass-by metadata to predict options (#8795)
Ettore Di Giacinto committed
February 27, 2026
E
chore(deps): bump llama.cpp to 'ecbcb7ea9d3303097519723b264a8b5f1e977028' (#8672)
Ettore Di Giacinto committed
February 17, 2026
R
fix(llama-cpp): Pass parameters when using embedded template (#8590)
Richard Palethorpe committed
February 14, 2026
January 28, 2026
E
chore(llama.cpp): bump to 'f6b533d898ce84bae8d9fa8dfc6697ac087800bf' (#8275)
Ettore Di Giacinto committed
January 22, 2026
E
feat: detect thinking support from backend automatically if not explicitly set (#8167)
Ettore Di Giacinto committed
January 20, 2026
E
chore(deps): Bump llama.cpp to '1c7cf94b22a9dc6b1d32422f72a627787a4783a3' (#8136)
Ettore Di Giacinto committed
January 9, 2026
E
chore(llama.cpp): propagate errors during model load (#7937)
Ettore Di Giacinto committed
E
chore(deps): Bump llama.cpp to '480160d47297df43b43746294963476fc0a6e10f' (#7933)
Ettore Di Giacinto committed
January 2, 2026
E
fix(llama.cpp/mmproj): fix loading mmproj in nested sub-dirs different from model path (#7832)
Ettore Di Giacinto committed
December 23, 2025
E
chore(deps): Bump llama.cpp to '5b6c9bc0f3c8f55598b9999b65aff7ce4119bc15' and refactor usage of base params (#7706)
Ettore Di Giacinto committed
December 22, 2025
E
chore(deps): bump llama.cpp to '0e1ccf15c7b6d05c720551b537857ecf6194d420' (#7684)
Ettore Di Giacinto committed
December 15, 2025
E
chore(llama.cpp): Add Missing llama.cpp Options to gRPC Server (#7584)
Ettore Di Giacinto committed
December 14, 2025
S
fix(7355): Update llama-cpp grpc for v3 interface (#7566)
Simon Redman committed
December 12, 2025
E
fix(llama.cpp): handle corner cases with tool array content (#7528)
Ettore Di Giacinto committed
December 9, 2025
E
chore(deps/llama-cpp): bump to '2fa51c19b028180b35d316e9ed06f5f0f7ada2c1' (#7484)
Ettore Di Giacinto committed
December 4, 2025
E
chore(deps): bump llama.cpp to 'bde188d60f58012ada0725c6dd5ba7c69fe4dd87' (#7434)
Ettore Di Giacinto committed
December 1, 2025
E
chore: :arrow_up: Update ggml-org/llama.cpp to `7f8ef50cce40e3e7e4526a3696cb45658190e69a` (#7402)
Ettore Di Giacinto committed
November 29, 2025
E
chore(deps): bump llama.cpp to 'd82b7a7c1d73c0674698d9601b1bbb0200933f29' (#7392)
Ettore Di Giacinto committed
November 26, 2025
E
chore(deps): bump llama.cpp to '583cb83416467e8abf9b37349dcf1f6a0083745a (#7358)
Ettore Di Giacinto committed
November 21, 2025
E
fix(llama.cpp): handle corner cases with tool content (#7324)
Ettore Di Giacinto committed
November 16, 2025
E
feat: add support to logitbias and logprobs (#7283)
Ettore Di Giacinto committed
November 14, 2025
E
fix: handle tool errors (#7271)
Ettore Di Giacinto committed
E
chore(deps): bump llama.cpp to `c4abcb2457217198efdd67d02675f5fddb7071c2` (#7266)
Ettore Di Giacinto committed
November 12, 2025
E
feat: import models via URI (#7245)
Ettore Di Giacinto committed
M
fix(reranker): llama-cpp sort score desc, crop top_n (#7211)
Mikhail Khludnev committed
November 9, 2025
E
feat: respect context and add request cancellation (#7187)
Ettore Di Giacinto committed
November 7, 2025
E
feat(llama.cpp): consolidate options and respect tokenizer template when enabled (#7120)
Ettore Di Giacinto committed
November 2, 2025
E
feat(llama.cpp): allow to set cache-ram and ctx_shift (#7009)
Ettore Di Giacinto committed