Commit Graph

  • 41361c8599 common : move up common_init() and fix Windows UTF-8 logs (#21176) Adrien Gallouët 2026-03-31 12:53:41 +02:00
  • 62278cedde sycl : enhance fattn perf (#21185) b8595 Neo Zhang 2026-03-31 18:31:50 +08:00
  • 90aa83c6bd common: add bounds check in common_init_result::sampler to prevent segfault on failed model load (#21082) mtmcp 2026-03-31 07:04:42 -03:00
  • fcc2d598c8 fix: include API key in CORS proxy requests for MCP connections (#21193) SATISH K C 2026-03-31 03:52:34 -05:00
  • 4453e77561 server/webui: cleanup dual representation approach, simplify to openai-compat (#21090) Piotr Wilkin (ilintar) 2026-03-31 10:42:06 +02:00
  • 26dac845cc vendor : update BoringSSL to 0.20260327.0 (#21211) b8591 Adrien Gallouët 2026-03-31 09:21:54 +02:00
  • 5ce013cd7e common : Disable backend sampling if reasoning budget is enabled (#21209) b8590 Galunid 2026-03-31 09:14:01 +02:00
  • 2985be3324 update hw info enhance_fa arthw 2026-03-31 09:24:40 +08:00
  • 08f21453ae opencl: add q4_K gemm and gemv kernels for Adreno (#20919) b8589 shaofeiqi 2026-03-30 12:19:16 -07:00
  • 84ae8434d0 CI : Enable CUDA and Vulkan ARM64 runners and fix CI/CD (#21122) Seungmin Kim 2026-03-31 03:24:37 +09:00
  • ead417f01c jinja : handle empty expressions correctly (#20913) b8587 Zhihao "Zephyr" Yao 2026-03-30 14:08:46 -04:00
  • 64ac9ab66a CUDA : Fix CUB's argsort when nrows % block_size == 0 CCCL < 3.1 (#21181) b8586 Oliver Simons 2026-03-30 16:20:00 +02:00
  • cad2d3884c rpc : fix misleading error log (#21184) b8585 Radoslav Gerganov 2026-03-30 17:05:11 +03:00
  • 389c7d4955 webui: Fix branching logic on edit message (#21175) Aleksander Grygier 2026-03-30 14:40:50 +02:00
  • 278521c33a llama-model-loader: print warning when using overrides with mmap (#20978) b8583 Aman Gupta 2026-03-30 17:40:17 +08:00
  • e2eb39e81c ci : bump ty to 0.0.26 (#21156) Sigbjørn Skjæret 2026-03-30 09:29:15 +02:00
  • abf9a62161 server: wrap headers for mcp proxy (#21072) b8581 Xuan-Son Nguyen 2026-03-30 08:59:16 +02:00
  • 7c203670f8 add missing ROPE_FACTORS_LONG/SHORT for MiniCPM (#21150) b8580 Sigbjørn Skjæret 2026-03-29 19:45:40 +02:00
  • ec16a072f0 Optimize MOE GEMV kernel for BS > 1. (#20905) b8579 Gaurav Garg 2026-03-29 22:05:18 +05:30
  • 1c128d941e remove junk gg/scripts-eval Georgi Gerganov 2026-03-29 17:31:04 +03:00
  • f5d1c4179f hexagon: dma optimizations (mostly fixing regressions) (#21137) b8578 Max Krasnyansky 2026-03-29 06:40:13 -07:00
  • 2405d59cb6 devops: including compute-runtime for intel.Dockerfile (#21076) Davi Henrique Linhares 2026-03-29 02:34:03 -03:00
  • afe65aa282 [SYCL] Enhance build script to use half cores to build, avoid OS hang (#21093) b8576 Neo Zhang 2026-03-29 09:02:45 +08:00
  • 65097181e4 fix **/x glob matching (#21129) b8575 Sigbjørn Skjæret 2026-03-28 22:27:38 +01:00
  • 98ae0a0d36 common/parser: fix handling of tool definition with missing properties key (#21128) b8574 Piotr Wilkin (ilintar) 2026-03-28 20:41:32 +01:00
  • 3a14a542f5 common : add character class support to glob_match (#21111) b8573 Sigbjørn Skjæret 2026-03-28 19:57:37 +01:00
  • 968189729f WebUI: Replace illegal nested button elements (#21026) BlueMöhre 2026-03-28 17:57:59 +01:00
  • e397d3885c common/json-schema: fix: handle non-capturing groups (?:...) in JSON schema pattern converter (#21124) b8571 Adrien 2026-03-28 17:55:38 +01:00
  • e6f2ec01ff common : add reasoning_format = none support to gpt-oss (#21094) b8570 Aldehir Rojas 2026-03-28 09:33:39 -05:00
  • edfb440a2f server : fix processing of multiple back-to-back mtmd chunks (#21107) b8569 Georgi Gerganov 2026-03-28 16:27:36 +02:00
  • 3d66da1809 ci : gracefully shut down the server (#21110) Adrien Gallouët 2026-03-28 14:49:57 +01:00
  • 82b703f8bc Document custom default webui preferences in server README (#19771) Woof Dog 2026-03-28 13:19:16 +00:00
  • 51a84efc53 webui: Conversation forking + branching improvements (#21021) Aleksander Grygier 2026-03-28 13:38:15 +01:00
  • b0f0dd3e51 vendor : update cpp-httplib to 0.40.0 (#21100) b8565 Adrien Gallouët 2026-03-28 08:59:44 +01:00
  • 0eb4764182 vulkan: add noncontiguous GLU support (#21081) Ruben Ortlam 2026-03-28 08:44:56 +01:00
  • 1f5d15e665 common/parser: fix reasoning whitespace bugs + extra parser tests (#21085) b8563 Piotr Wilkin (ilintar) 2026-03-28 07:29:26 +01:00
  • c46758d28f cli : add /glob command (#21084) b8562 Sigbjørn Skjæret 2026-03-28 02:33:04 +01:00
  • bf934f28db docker : fix and enable ARM64 image build (#20929) Ts-sound 2026-03-28 08:45:09 +08:00
  • 5c1a7b8355 server : add custom socket options to disable SO_REUSEPORT (#21056) b8560 Adrien Gallouët 2026-03-28 01:12:43 +01:00
  • f0fea264b0 cont : rand hadamard matrices gg/attn-rot-rand Georgi Gerganov 2026-03-27 20:11:47 +02:00
  • 59d840209a common : inhibit lazy grammar sampler while reasoning is active (#20970) b8559 Aldehir Rojas 2026-03-27 12:30:40 -05:00
  • ff934e29bc server: Introduce LLAMA_BUILD_WEBUI build flag to allow disabling the embedded web ui (#20158) b8558 Kusha Gharahi 2026-03-27 11:25:55 -05:00
  • ee051c1e4e hexagon: support for IQ4_NL and MXFP4 (#21018) b8557 Yiwei Shao 2026-03-27 09:22:41 -07:00
  • e6f6770515 webui: Improve Chat Messages initial scroll + auto-scroll logic + add lazy loading with transitions to content blocks (#20999) Aleksander Grygier 2026-03-27 17:01:36 +01:00
  • ff76c6731d cont : cache shift support gg/attn-rot-wip Georgi Gerganov 2026-03-27 14:39:14 +02:00
  • 7711b3a36a cont : rotate caches separately + support non-power-of-2 head sizes Georgi Gerganov 2026-03-27 13:56:22 +02:00
  • 48cda24c11 server: remove the verbose_prompt parameter (#21059) b8555 AN Long 2026-03-27 19:36:13 +08:00
  • 871f1a2d2f mtmd: add more sanity checks (#21047) b8554 Xuan-Son Nguyen 2026-03-27 11:00:52 +01:00
  • 832e32639f cont : rotate V more + refactor Georgi Gerganov 2026-03-27 11:29:16 +02:00
  • 20197b6fe3 server: add built-in tools backend support (#20898) b8553 Xuan-Son Nguyen 2026-03-27 10:07:11 +01:00
  • ba38f3becc rpc : proper handling of data pointers to CPU buffers (#21030) b8552 Radoslav Gerganov 2026-03-27 10:59:35 +02:00
  • 37f230dd7c completion : session_tokens insert range in completion tool (no-op → correct) (#20917) b8551 mtmcp 2026-03-27 05:25:58 -03:00
  • a308e584ca completion : Fix segfault on model load failure (#21049) b8550 mtmcp 2026-03-27 05:01:13 -03:00
  • d0fa2c9fbb Send reasoning content back to the model across turns via the reasoning_content API field (#21036) Pascal 2026-03-27 08:17:35 +01:00
  • 9bcb4eff4d metal : Fix dimension constraint violation in matmul2d descriptor (#21048) b8548 ren 2026-03-27 00:05:21 -07:00
  • 6861f6509a CANN: update docker images to 8.5.0 and improve CANN.md (#20801) b8547 KokerZhou 2026-03-27 08:53:00 +08:00
  • 1743d98057 mtmd: fix "v.patch_embd" quant and unsupported im2col ops on Metal for deepseek-ocr (#21027) b8546 Saba Fallah 2026-03-27 00:07:55 +01:00
  • 7ca0c9cca7 hip: use fnuz fp8 for conversion on CDNA3 (#21040) b8545 uvos 2026-03-26 23:06:33 +01:00
  • 8c60b8a2be ci: pin external actions to exact commit SHA (#21033) Xuan-Son Nguyen 2026-03-26 20:44:00 +01:00
  • 287b5b1eab common : add getpwuid fallback for HF cache when HOME is not set (#21035) Adrien Gallouët 2026-03-26 20:34:23 +01:00
  • a73bbd5d92 mtmd: refactor image preprocessing (#21031) Xuan-Son Nguyen 2026-03-26 19:49:20 +01:00
  • e5aa067d68 llama : rotate activations for better quantization Georgi Gerganov 2026-03-26 18:38:55 +02:00
  • ded446b34c opencl: allow large buffer for adreno (#20997) lhez 2026-03-26 08:52:21 -07:00
  • f8d4abae86 convert : support Qwen3.5/Qwen3.5 Moe NVFP4 and add input scales (#20505) Michael Wand 2026-03-26 08:52:06 -07:00
  • 3d5acab3e7 convert : add RuGPT3XL (RuGPT3XLForCausalLM) support (#21011) Pavel Zloi 2026-03-26 18:49:09 +03:00
  • 9900b29c3a common : filter out imatrix when finding models (#21023) Adrien Gallouët 2026-03-26 15:37:18 +01:00
  • dc8d14c582 fix(ggml): correct RISC-V ISA string canonical ordering for RVV in CMake (#20888) ihb2032 2026-03-26 19:08:41 +08:00
  • 93dfbc1291 common : make LLAMA_CACHE the one cache for everything (#21009) Adrien Gallouët 2026-03-26 12:04:57 +01:00
  • 3cba8bba18 common : fix split model migration (#21019) Adrien Gallouët 2026-03-26 12:04:37 +01:00
  • 112c78159f ggml-cuda: Add NVFP4 dp4a kernel (#20644) Michael Wand 2026-03-26 01:54:03 -07:00
  • 0fac87b157 imatrix : fix crash when using --show-statistics with zero counts (#19532) b8533 SamareshSingh 2026-03-26 02:14:36 -05:00
  • 0a524f2404 CUDA & CPU: support F32 kernel type for CONV_TRANSPOSE_2D (#17094) b8532 Yihao Wang 2026-03-25 19:19:14 -07:00
  • c0159f9c1f common : do not delete old files from the old cache when updating (#21000) b8531 Adrien Gallouët 2026-03-25 22:28:04 +01:00
  • a970515bdb mtmd: Add DeepSeekOCR Support (#17400) b8530 Saba Fallah 2026-03-25 19:57:40 +01:00
  • 4cd732f445 better wording xsn/ai_policy_private_repo Xuan Son Nguyen 2026-03-25 19:46:17 +01:00
  • 056b50c319 common : fix verbosity setup (#20989) b8529 Adrien Gallouët 2026-03-25 19:41:01 +01:00
  • 9f9a0bde37 contrib: update AI policy to allow private repo Xuan Son Nguyen 2026-03-25 19:39:41 +01:00
  • f2c72b8f1f common : fix gguf selection in common_list_cached_models (#20996) b8528 Adrien Gallouët 2026-03-25 19:18:06 +01:00
  • ec54ac13a8 ci : fix parsing of vgpr counts in hip-quality-check (#20987) uvos 2026-03-25 19:00:37 +01:00
  • 80322ebdaf model: codefuse-ai/F2LLM-v2 support b8526 Saba Fallah 2026-03-25 18:33:42 +01:00
  • 44c51e526b model : allow causal_attn and pooling_type on all architectures (#20973) b8525 Dowon 2026-03-26 02:12:38 +09:00
  • 1922f87c2f snapdragon: add missing features to WoS scripts to achieve parity with ADB scripts (#20884) Aparna M P 2026-03-25 22:13:12 +05:30
  • 345de3cd87 Use docker in build-android.yml (#20928) Shreya Jain 2026-03-25 09:36:27 -07:00
  • 9c600bcd4b llama-bench: print -n-cpu-moe when offloaded layers > 1 (#20984) b8522 Aman Gupta 2026-03-25 21:17:27 +08:00
  • b2704f9028 ci: Allow ninja to be used during unit test (#20742) Masato Nakasaka 2026-03-25 06:00:49 -07:00
  • 3fab96cd04 ci : disable self-hosted mac jobs (#20985) Georgi Gerganov 2026-03-25 14:46:40 +02:00
  • 914eb5ff0c jinja: fix macro with kwargs (#20960) b8519 Xuan-Son Nguyen 2026-03-25 12:22:48 +01:00
  • 8fc17493c3 gguf-split : clarify operation of gguf-split (#19749) Francisco Herrera 2026-03-25 06:12:50 -05:00
  • 36dafba5c4 llama: fix llama-model-saver (#20503) b8517 Johannes Gäßler 2026-03-25 11:53:16 +01:00
  • 69e0ecef06 webui: Fix editing assistant message without branching (#20944) Aleksander Grygier 2026-03-25 11:47:33 +01:00
  • 062cca58fc Add SLEEPING status to the WebUI model selector (#20949) Pascal 2026-03-25 11:02:32 +01:00
  • 406f4e3f61 android : fix-pointer-dangling (#20974) b8514 yikechayedan 2026-03-25 17:51:26 +08:00
  • 53dc8b59bf sycl : fix wrong variable check by assert (#20903) b8513 Neo Zhang 2026-03-25 17:48:37 +08:00
  • 403c9c9cef ci : bump gguf publish python version (#20982) Sigbjørn Skjæret 2026-03-25 10:04:59 +01:00
  • 8fc85db9d2 ci : limit requirements versions (#20980) Sigbjørn Skjæret 2026-03-25 09:55:37 +01:00
  • 3a60d06ad9 convert : register Qwen3Model architecture (#20967) Dowon 2026-03-25 17:37:59 +09:00
  • abd86ef175 docs : Update OpenVINO backend docs (#20968) Ravi Panchumarthy 2026-03-25 01:33:51 -07:00
  • 07a6fd8775 kleidiai: removed cpu feature detection from CI run script pr/20394 Martin Klacer 2026-03-24 17:24:41 +00:00
  • 9f102a1407 models : move the token embedding norms to the first layer (#20943) b8508 Georgi Gerganov 2026-03-24 17:00:30 +02:00
  • df488da9ac fix double semicolon 0cc4m/vulkan-repack Ruben Ortlam 2026-03-24 13:57:56 +01:00