COMMITS
/ core/http/endpoints/openai/inference.go April 9, 2026
E
fix: thinking models with tools returning empty content (reasoning-only retry loop) (#9290)
Ettore Di Giacinto committed
April 6, 2026
E
fix(chat): do not retry if we had chatdeltas or tooldeltas from backend (#9244)
Ettore Di Giacinto committed
March 29, 2026
E
feat: add distributed mode (#9124)
Ettore Di Giacinto committed
March 16, 2026
E
chore: refactor endpoints to use same inferencing path, add automatic retrial mechanism in case of errors (#9029)
Ettore Di Giacinto committed
March 8, 2026
E
feat(functions): add peg-based parsing and allow backends to return tool calls directly (#8838)
Ettore Di Giacinto committed
March 5, 2026
E
feat: pass-by metadata to predict options (#8795)
Ettore Di Giacinto committed
November 16, 2025
E
feat: add support to logitbias and logprobs (#7283)
Ettore Di Giacinto committed
November 7, 2025
E
feat(llama.cpp): consolidate options and respect tokenizer template when enabled (#7120)
Ettore Di Giacinto committed
August 14, 2025
E
feat(backends): add system backend, refactor (#6059)
Ettore Di Giacinto committed
June 29, 2025
E
fix(gallery): automatically install model from name (#5757)
Ettore Di Giacinto committed
February 10, 2025
D
feat: Centralized Request Processing middleware (#3847)
Dave committed
January 17, 2025
M
feat: add machine tag and inference timings (#4577)
mintyleaf committed
September 19, 2024
E
feat(api): allow to pass audios to backends (#3603)
Ettore Di Giacinto committed
E
feat(api): allow to pass videos to backends (#3601)
Ettore Di Giacinto committed
June 23, 2024
S
chore: fix go.mod module (#2635)
Sertaç Özercan committed
April 17, 2024
E
Revert #1963 (#2056)
Ettore Di Giacinto committed
April 13, 2024
D
April 11, 2024
L
feat: use tokenizer.apply_chat_template() in vLLM (#1990)
Ludovic Leroux committed
March 1, 2024
D
refactor: move remaining api packages to core (#1731)
Dave committed