COMMITS
/ core/backend/options.go May 30, 2026
L
feat: prefix-cache-aware routing for distributed mode (#10071)
LocalAI [bot] committed
May 25, 2026
R
feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802)
Richard Palethorpe committed
May 23, 2026
L
fix(traces): cap backend trace Data to keep admin UI responsive (#9960)
LocalAI [bot] committed
May 22, 2026
L
feat(config): default prompt_cache_all to true (#9951)
LocalAI [bot] committed
May 5, 2026
L
fix(backend): resolve relative draft_model paths against the models dir (#9680)
LocalAI [bot] committed
April 28, 2026
R
feat(vllm): expose AsyncEngineArgs via generic engine_args YAML map (#9563)
Richard Palethorpe committed
March 31, 2026
E
feat: add node reconciler, allow to schedule to group of nodes, min/max autoscaler (#9186)
Ettore Di Giacinto committed
March 29, 2026
E
feat: add distributed mode (#9124)
Ettore Di Giacinto committed
March 21, 2026
E
feat: inferencing default, automatic tool parsing fallback and wire min_p (#9092)
Ettore Di Giacinto committed
March 18, 2026
R
feat(ui): Per model backend logs and various fixes (#9028)
Richard Palethorpe committed
March 15, 2026
L
fix: Automatically disable mmap for Intel SYCL backends (#9012) (#9015)
LocalAI [bot] committed
March 5, 2026
E
feat: pass-by metadata to predict options (#8795)
Ettore Di Giacinto committed
January 2, 2026
E
fix(llama.cpp/mmproj): fix loading mmproj in nested sub-dirs different from model path (#7832)
Ettore Di Giacinto committed
December 21, 2025
E
chore(refactor): move logging to common package based on slog (#7668)
Ettore Di Giacinto committed
November 16, 2025
E
feat: add support to logitbias and logprobs (#7283)
Ettore Di Giacinto committed
October 10, 2025
E
fix(llama.cpp): correctly set grammar triggers (#6432)
Ettore Di Giacinto committed
August 31, 2025
E
feat(flash_attention): set auto for flash_attention in llama.cpp (#6168)
Ettore Di Giacinto committed
August 14, 2025
E
feat(backends): add system backend, refactor (#6059)
Ettore Di Giacinto committed
July 22, 2025
E
feat: refactor build process, drop embedded backends (#5875)
Ettore Di Giacinto committed
June 28, 2025
E
feat(llama.cpp): allow to set kv-overrides (#5745)
Ettore Di Giacinto committed
May 22, 2025
E
feat(llama.cpp): add reranking (#5396)
Ettore Di Giacinto committed
May 3, 2025
E
chore(defaults): enlarge defaults, drop gpu layers which is infered (#5308)
Ettore Di Giacinto committed
April 19, 2025
E
chore(autogptq): drop archived backend (#5214)
Ettore Di Giacinto committed
April 1, 2025
E
feat(loader): enhance single active backend by treating as singleton (#5107)
Ettore Di Giacinto committed
March 5, 2025
E
chore(deps): update llama.cpp and sync with upstream changes (#4950)
Ettore Di Giacinto committed
February 18, 2025
B
February 2, 2025
E
feat(llama.cpp): Add support to grammar triggers (#4733)
Ettore Di Giacinto committed
January 17, 2025
E
chore(vall-e-x): Drop backend (#4619)
Ettore Di Giacinto committed
December 6, 2024
E
feat(llama.cpp): expose cache_type_k and cache_type_v for quant of kv cache (#4329)
Ettore Di Giacinto committed
December 3, 2024
E
feat(backend): add stablediffusion-ggml (#4289)
Ettore Di Giacinto committed
November 8, 2024
E
chore(refactor): drop unnecessary code in loader (#4096)
Ettore Di Giacinto committed
November 5, 2024
E
feat(diffusers): allow multiple lora adapters (#4081)
Ettore Di Giacinto committed
October 23, 2024
E
feat(vllm): expose 'load_format' (#3943)
Ettore Di Giacinto committed
October 2, 2024
E
feat: track internally started models by ID (#3693)
Ettore Di Giacinto committed
September 22, 2024
S
feat: auto load into memory on startup (#3627)
Sertaç Özercan committed
July 15, 2024
E
feat(llama.cpp): support embeddings endpoints (#2871)
Ettore Di Giacinto committed
June 26, 2024
E
feat(options): add `repeat_last_n` (#2660)
Ettore Di Giacinto committed
June 23, 2024
S
chore: fix go.mod module (#2635)
Sertaç Özercan committed
May 13, 2024
E
feat(llama.cpp): add `flash_attention` and `no_kv_offloading` (#2310)
Ettore Di Giacinto committed
April 26, 2024
April 25, 2024
April 20, 2024
T
Add tensor_parallel_size setting to vllm setting items (#2085)
Taikono-Himazin committed
April 17, 2024
E
Revert #1963 (#2056)
Ettore Di Giacinto committed
April 13, 2024
D
April 6, 2024
E
fix(llama.cpp): set better defaults for llama.cpp (#1961)
Ettore Di Giacinto committed
April 3, 2024
E
fix(seed): generate random seed per-request if -1 is set (#1952)
Ettore Di Giacinto committed
March 13, 2024
E
fix(config): set better defaults for inferencing (#1822)
Ettore Di Giacinto committed
March 7, 2024
E
feat(intel): add diffusers/transformers support (#1746)
Ettore Di Giacinto committed
March 1, 2024
L
Bump vLLM version + more options when loading models in vLLM (#1782)
Ludovic Leroux committed
D
refactor: move remaining api packages to core (#1731)
Dave committed