COMMITS
/ examples/batched/batched.cpp March 31, 2026
A
common : move up common_init() and fix Windows UTF-8 logs (#21176)
Adrien Gallouët committed
March 4, 2026
S
Fix locale-dependent float printing in GGUF metadata (#17331)
SamareshSingh committed
January 15, 2026
G
context : reserve new scheduler when graph topology changes (#18547)
Georgi Gerganov committed
January 12, 2026
D
examples : add --kv-unified to batched example (#18774)
Daniel Bevenius committed
January 4, 2026
D
sampling : add support for backend sampling (#17004)
Daniel Bevenius committed
December 14, 2025
G
common : refactor common_sampler + grammar logic changes (#17937)
Georgi Gerganov committed
April 1, 2025
X
common : refactor downloading system, handle mmproj with -hf option (#12694)
Xuan-Son Nguyen committed
January 12, 2025
G
llama : add `llama_vocab`, functions -> methods, naming (#11110)
Georgi Gerganov committed
January 6, 2025
G
llama : update llama_model API names (#11063)
Georgi Gerganov committed
G
llama : use LLAMA_TOKEN_NULL (#11062)
Georgi Gerganov committed
December 16, 2024
G
sampling : refactor + optimize penalties sampler (#10803)
Georgi Gerganov committed
November 25, 2024
G
speculative : refactor and add a simpler example (#10362)
Georgi Gerganov committed
October 10, 2024
D
common : use common_ prefix for common library functions (#9805)
Diego Devesa committed
September 15, 2024
G
common : reimplement logging (#9418)
Georgi Gerganov committed
September 13, 2024
G
llama : llama_perf + option to disable timings during decode (#9355)
Georgi Gerganov committed
September 9, 2024
X
common : move arg parser code to `arg.cpp` (#9388)
Xuan Son Nguyen committed
S
llama : minor sampling refactor (2) (#9386)
slaren committed
September 7, 2024
X
common : refactor arg parser (#9308)
Xuan Son Nguyen committed
G
llama : refactor sampling v2 (#9294)
Georgi Gerganov committed
July 17, 2024
M
batched: fix n_predict parameter (#8527)
Masaya, Kato committed
July 4, 2024
F
Inference support for T5 and FLAN-T5 model families (#5763)
fairydreaming committed
June 4, 2024
G
common : refactor cli arg parsing (#7675)
Georgi Gerganov committed
May 22, 2024
G
common : normalize naming style (#7462)
Georgi Gerganov committed
April 21, 2024
P
llama : support Llama 3 HF conversion (#6745)
Pedro Cuenca committed
March 22, 2024
G
metal : pad n_ctx by 32 (#6177)
Georgi Gerganov committed
March 11, 2024
G
llama : more consistent names of count variables (#5994)
Georgi Gerganov committed
March 8, 2024
C
llama : support Mamba Selective State Space Models (#5328)
compilade committed
February 18, 2024
H
ggml, common, examples, tests : fixed type arguments in printf (#5528)
Herman Semenov committed
February 16, 2024
B
ggml : add numa options (#5377)
bmwl committed
January 8, 2024
G
examples : add passkey test (#3856)
Georgi Gerganov committed
October 24, 2023
G
cuda : add batched cuBLAS GEMM for faster attention (#3749)
Georgi Gerganov committed
October 23, 2023
M
llama : remove token functions with `context` args in favor of `model` (#3720)
Marcus Dunn committed
October 22, 2023
G
batched : add len CLI argument
Georgi Gerganov committed
October 18, 2023
G
speculative : add tree-based sampling example (#3624)
Georgi Gerganov committed
October 11, 2023
G
batched : add bench tool (#3545)
Georgi Gerganov committed
September 28, 2023
S
G
llama : custom attention mask + parallel decoding + no context swaps (#3228)
Georgi Gerganov committed