SIGN IN SIGN UP
mudler / LocalAI UNCLAIMED

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

0 0 51 Go

TAGS

20 tags
v4.2.6

chore: :arrow_up: Update leejet/stable-diffusion.cpp to `bd17f53b7386fb5f60e8587b75e73c4b2fed3426` (#9854) :arrow_up: Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

v4.2.5

fix(ollama): accept float-encoded integer options (fixes #9837) (#9849) fix(ollama): accept float-encoded integer options (num_ctx, top_k, ...) Home Assistant's Ollama integration encodes integer options as JSON floats (e.g. `"num_ctx": 8192.0`). Stdlib `json.Unmarshal` refuses to decode a number with fractional notation into an `int` field, so the entire request was rejected with HTTP 400 before reaching the backend: Unmarshal type error: expected=int, got=number 8192.0, field=options.num_ctx Add a custom `UnmarshalJSON` on `OllamaOptions` that routes the int fields (`top_k`, `num_predict`, `seed`, `repeat_last_n`, `num_ctx`) through `*json.Number`, then converts via `Int64()` with a `Float64()` fallback. Public field types are unchanged, so endpoint code is untouched. Float fields and `stop` continue to parse via the default path. Fixes #9837 Assisted-by: Claude Code:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

v4.2.4

ci(image): publish missing :latest-* and :v<X>-* singleton image tags (#9812) * ci(image): wire singleton merges + `--` artifact separator Closes the same singletons gap on the LocalAI server image workflow that PR #9781 closed for backends. The user observed it as missing :latest-gpu-nvidia-cuda-12 etc. on quay.io/go-skynet/local-ai — the build matrix has six single-arch entries with no corresponding merge step, so their per-arch digests push (push-by-digest=true) and never get tagged: - -gpu-hipblas (hipblas-jobs) - -gpu-nvidia-cuda-12 (core-image-build) - -gpu-nvidia-cuda-13 (core-image-build) - -gpu-intel (core-image-build) - -nvidia-l4t-arm64 (gh-runner) - -nvidia-l4t-arm64-cuda-13 (gh-runner) Only :latest, :v<X>, :latest-gpu-vulkan and :v<X>-gpu-vulkan were actually being published before this commit (the two multiarch suffixes that had merge jobs). Changes: 1. image.yml: add six new merge jobs, one per single-arch entry. Each `needs:` only its parent build job (matching the existing pattern for core-image-merge / gpu-vulkan-image-merge). 2. image_build.yml: switch artifact name to `digests-localai<suffix>--<platform-tag-or-"single">`. The `--` separator anchors the merge-side glob so a singleton tag-suffix doesn't over-match a longer suffix that shares its prefix (-nvidia-l4t-arm64 vs -nvidia-l4t-arm64-cuda-13). Same convention as backend_build.yml's fix. 3. image_merge.yml: update the download pattern to match. Next master push or tag release should produce :latest-gpu-hipblas, :latest-gpu-nvidia-cuda-12, :latest-gpu-nvidia-cuda-13, :latest-gpu-intel, :latest-nvidia-l4t-arm64, :latest-nvidia-l4t-arm64-cuda-13 (and their :v<X>-* equivalents) for the first time on the post-#9781 workflow. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(image): add !cancelled() guard to all 8 image merge jobs Parity pass with backend.yml's merge jobs (8521af14). Without !cancelled(), GHA's default `needs:` cascade skips the merge when ANY matrix cell of the parent build job fails or is cancelled — so a single flaky leg would suppress publication of every other tag-suffix's manifest list. Same fix the backend got after v4.2.1 showed 2 failed singlearch builds cascade-skip 199 singlearch merge entries. Applied to all 8 image merges: - core-image-merge - gpu-vulkan-image-merge - gpu-nvidia-cuda-12-image-merge (added in e5300f1a) - gpu-nvidia-cuda-13-image-merge (added in e5300f1a) - gpu-intel-image-merge (added in e5300f1a) - gpu-hipblas-image-merge (added in e5300f1a) - nvidia-l4t-arm64-image-merge (added in e5300f1a) - nvidia-l4t-arm64-cuda-13-image-merge (added in e5300f1a) Build jobs (hipblas-jobs, core-image-build, gh-runner) are intentionally NOT changed — they have no upstream `needs:` to cascade- skip from. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

v4.2.3

chore: :arrow_up: Update ikawrakow/ik_llama.cpp to `f9a93c37e2fc021760c3c1aa99cf74c73b7591a7` (#9795) :arrow_up: Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

v4.2.2

feat(llama-cpp): bump to `1ec7ba0c`, adapt grpc-server, expose new spec-decoding options (#9765) * chore(llama.cpp): bump to 1ec7ba0c14f33f17e980daeeda5f35b225d41994 Picks up the upstream `spec : parallel drafting support` change (ggml-org/llama.cpp#22838) which reshapes the speculative-decoding API and `server_context_impl`. Adapt the grpc-server wrapper accordingly: * `common_params_speculative::type` (single enum) became `types` (`std::vector<common_speculative_type>`). Update both the "default to draft when a draft model is set" branch and the `spec_type`/`speculative_type` option parser. The parser now also tolerates comma-separated lists, mirroring the upstream `common_speculative_types_from_names` semantics. * `common_params_speculative_draft::n_ctx` is gone (draft now shares the target context size). Keep the `draft_ctx_size` option name for backward compatibility and ignore the value rather than failing. * `server_context_impl::model` was renamed to `model_tgt`; update the two reranker / model-metadata call sites. Replaces #9763. Builds cleanly under the linux/amd64 cpu-llama-cpp target locally. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(llama-cpp): expose new speculative-decoding option keys Upstream `spec : parallel drafting support` (ggml-org/llama.cpp#22838) adds the `ngram_mod`, `ngram_map_k`, and `ngram_map_k4v` speculative families and beefs up the draft-model knobs. The previous bump only adapted the API; this exposes the new fields through the grpc-server options dictionary so model configs can drive them. New `options:` keys (all under `backend: llama-cpp`): ngram_mod (`ngram_mod` type): spec_ngram_mod_n_min / spec_ngram_mod_n_max / spec_ngram_mod_n_match ngram_map_k (`ngram_map_k` type): spec_ngram_map_k_size_n / spec_ngram_map_k_size_m / spec_ngram_map_k_min_hits ngram_map_k4v (`ngram_map_k4v` type): spec_ngram_map_k4v_size_n / spec_ngram_map_k4v_size_m / spec_ngram_map_k4v_min_hits ngram lookup caches (`ngram_cache` type): spec_lookup_cache_static / lookup_cache_static spec_lookup_cache_dynamic / lookup_cache_dynamic Draft-model tuning (active when `spec_type` is `draft`): draft_cache_type_k / spec_draft_cache_type_k draft_cache_type_v / spec_draft_cache_type_v draft_threads / spec_draft_threads draft_threads_batch / spec_draft_threads_batch draft_cpu_moe / spec_draft_cpu_moe (bool flag) draft_n_cpu_moe / spec_draft_n_cpu_moe (first N MoE layers on CPU) draft_override_tensor / spec_draft_override_tensor (comma-separated <tensor regex>=<buffer type>; re-implements upstream's static parse_tensor_buffer_overrides since it isn't exported) `spec_type` already accepted comma-separated lists after the previous commit, matching upstream's `common_speculative_types_from_names`. Docs: refresh `docs/content/advanced/model-configuration.md` with per-family tables and a note about multi-type chaining. Builds locally with `make docker-build-llama-cpp` (linux/amd64 cpu-llama-cpp AVX variant). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(turboquant): bridge new llama.cpp spec API to the legacy fork layout The previous commits in this series adapted backend/cpp/llama-cpp/grpc-server.cpp to the post-#22838 (parallel drafting) llama.cpp API. The turboquant build reuses the same grpc-server.cpp through backend/cpp/turboquant/Makefile, which copies it into turboquant-<flavor>-build/ and runs patch-grpc-server.sh on the copy. The fork branched before the API refactor, so it errors out on: * `ctx_server.impl->model_tgt` (fork still has `model`) * `params.speculative.{ngram_mod,ngram_map_k,ngram_map_k4v,ngram_cache}.*` (none of these sub-structs exist in the fork) * `params.speculative.draft.{cache_type_k/v, cpuparams[, _batch].n_threads, tensor_buft_overrides}` (fork uses the pre-#22397 flat layout) * `params.speculative.types` vector / `common_speculative_types_from_names` (fork has a scalar `type` and only the singular helper) Approach: 1. backend/cpp/llama-cpp/grpc-server.cpp: introduce a single feature switch `LOCALAI_LEGACY_LLAMA_CPP_SPEC`. When defined, the two `speculative.type[s]` discriminations (the "default to draft when a draft model is set" branch and the `spec_type` / `speculative_type` option parser) fall back to the singular scalar form, and the entire new-option block (ngram_mod / map_k / map_k4v / ngram_cache / draft.{cache_type_*, cpuparams*, tensor_buft_overrides}) is preprocessed out. The macro is *not* defined in the source tree — stock llama-cpp builds get the full new API. 2. backend/cpp/turboquant/patch-grpc-server.sh: two new patch steps applied to the per-flavor build copy at turboquant-<flavor>-build/grpc-server.cpp: - substitute `ctx_server.impl->model_tgt` -> `ctx_server.impl->model` - inject `#define LOCALAI_LEGACY_LLAMA_CPP_SPEC 1` before the first `#include`, so the guarded blocks above drop out for the fork build. Both patches are idempotent and follow the existing sed/awk pattern in this script (KV cache types, `get_media_marker`, flat speculative renames). Stock llama-cpp's `grpc-server.cpp` is never touched. Drop both legacy patches once the turboquant fork rebases past ggml-org/llama.cpp#22397 / #22838. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(turboquant): close draft_ctx_size brace inside legacy guard The previous turboquant fix wrapped the new option-handler blocks in `#ifndef LOCALAI_LEGACY_LLAMA_CPP_SPEC ... #endif` but placed the guard in the middle of an `else if` chain — the `} else if` openings of the new blocks were responsible for closing the previous block's brace. With the macro defined the new blocks vanish, draft_ctx_size's `{` loses its closer, the for-loop's `}` is consumed instead, and the file ends with a stray opening brace — clang reports it as `function-definition is not allowed here before '{'` on the next top-level `int main(...)` and `expected '}' at end of input`. Move the chain split inside the draft_ctx_size branch: } else if (... "draft_ctx_size") { // ... #ifdef LOCALAI_LEGACY_LLAMA_CPP_SPEC } // legacy: chain ends here #else } else if (... "spec_ngram_mod_n_min") { // modern: chain continues ... } else if (... "draft_override_tensor") { ... } // closes last branch #endif } // closes for-loop Brace count is now balanced under both preprocessor branches (verified with `tr -cd '{' | wc -c` against the patched and unpatched outputs). Local `make docker-build-turboquant` builds the linux/amd64 cpu-llama-cpp `turboquant-avx` variant cleanly. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(ci): forward AMDGPU_TARGETS into Dockerfile.turboquant builder-prebuilt Dockerfile.turboquant's `builder-prebuilt` stage was missing the `ARG AMDGPU_TARGETS` / `ENV AMDGPU_TARGETS=${AMDGPU_TARGETS}` pair that `builder-fromsource` already has (and that `Dockerfile.llama-cpp` mirrors across both stages). When CI uses the prebuilt base image (quay.io/go-skynet/ci-cache:base-grpc-*, the common path) the build-arg passed by the workflow never reaches the env inside the compile stage. backend/cpp/llama-cpp/Makefile:38 (introduced by #9626) errors out on hipblas builds when AMDGPU_TARGETS is empty, and the turboquant Makefile reuses backend/cpp/llama-cpp via a sibling build dir, so the same check fires from turboquant-fallback under BUILD_TYPE=hipblas: Makefile:38: *** AMDGPU_TARGETS is empty — set it to a comma-separated list of gfx targets e.g. gfx1100,gfx1101. Stop. make: *** [Makefile:66: turboquant-fallback] Error 2 The bug is latent on master because the docker layer cache stays warm across builds — the compile step rarely re-runs from scratch. The llama.cpp bump in this PR invalidates the cache, so the missing env var becomes load-bearing and the hipblas turboquant CI job fails. Mirror the existing pattern from Dockerfile.llama-cpp. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

v4.2.1

feat(ollama): report model capabilities + details on /api/tags and /api/show (#9766) Ollama-compatible clients (Open WebUI, Enchanted, ollama-grid-search, etc.) rely on the `capabilities` list and `details.{parameter_size, quantization_level,families}` fields returned by /api/tags and /api/show to decide which models are eligible for a given task -- for example to filter the "embedding model" picker. Upstream Ollama returns these; LocalAI's compat layer was leaving them empty, so embedding models were silently rejected by clients that only allow chat models for chat and only allow embedding models for embeddings. This wires up the existing config signals already present in ModelConfig: - modelCapabilities() derives the Ollama capability strings from the config: "embedding" (FLAG_EMBEDDINGS), "completion" (FLAG_CHAT / FLAG_COMPLETION), "vision" (explicit KnownUsecases bit or MMProj / multimodal template / backend media marker), "tools" (auto-detected ToolFormatMarkers, JSON/Response regex, XML format, grammar triggers), "thinking" (ReasoningConfig with reasoning not disabled) and "insert" (presence of a completion template). - modelDetailsFromModelConfig() now fills families, parameter_size and quantization_level. The latter two are parsed from the GGUF filename via regex -- conservative tokens only (Q*/IQ*/F16/F32/BF16 and \d+(\.\d+)?[BM] surrounded by separators) so we don't accidentally match "Qwen3" as "3B". - modelInfoFromModelConfig() exposes general.architecture and general.context_length in the new ShowResponse.model_info map. Note: HasUsecases(FLAG_VISION) cannot be used directly -- GuessUsecases has no FLAG_VISION case and returns true at the end for any chat model. hasVisionSupport() instead reads KnownUsecases explicitly plus MMProj / template / media-marker signals. Tests are written first (TDD) using Ginkgo/Gomega -- DescribeTable for the capability mapping (embedding-only, chat, vision, thinking, tools via markers, tools via JSON regex, no-capability rerank) plus integration tests against ShowModelEndpoint that round-trip JSON through a real ModelConfigLoader populated from a temp YAML file. Fixes #9760. Assisted-by: Claude Code:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

v4.2.0

chore: :arrow_up: Update ggml-org/llama.cpp to `389ff61d77b5c71cec0cf92fe4e5d01ace80b797` (#9752) :arrow_up: Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

v4.1.3

chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.64.0 to 0.65.0 (#9254) chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus Bumps [go.opentelemetry.io/otel/exporters/prometheus](https://github.com/open-telemetry/opentelemetry-go) from 0.64.0 to 0.65.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/exporters/prometheus/v0.64.0...exporters/prometheus/v0.65.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/otel/exporters/prometheus dependency-version: 0.65.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

v4.1.2

docs: :arrow_up: update docs version mudler/LocalAI (#9241) :arrow_up: Update docs version mudler/LocalAI Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

v4.1.1

chore(gallery): add mmproj file for gemma4 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

v4.1.0

chore(model gallery): :robot: add 1 new models via gallery agent (#9202) chore(model gallery): :robot: add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

v4.0.0

fix(ui): do not let from button to trigger Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

v3.12.1

chore: :arrow_up: Update ggml-org/llama.cpp to `ba3b9c8844aca35ecb40d31886686326f22d2214` (#8613) :arrow_up: Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

v3.12.0

fix(ui): pass by needed values to unbreak model editor Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

v3.11.0

chore: :arrow_up: Update ggml-org/llama.cpp to `8872ad2125336d209a9911a82101f80095a9831d` (#8448) :arrow_up: Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

v3.10.1

feat(qwen-tts): add Qwen-tts backend (#8163) * feat(qwen-tts): add Qwen-tts backend Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Update intel deps Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop flash-attn for cuda13 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

v3.10.0

chore: drop neutts for l4t (#8101) Builds exhausts CI currently, and there are better backends at this point in time. We will probably deprecate it in the future. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

v3.9.0

chore(model gallery): :robot: add 1 new models via gallery agent (#7712) chore(model gallery): :robot: add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

v3.8.0

fix: Initialize sudo reference before its first actual use (#7360)

v3.7.0

chore: :arrow_up: Update ggml-org/llama.cpp to `31c511a968348281e11d590446bb815048a1e912` (#6970) :arrow_up: Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>