feat(llama-cpp): expose split_mode option for multi-GPU placement (#9560)
Adds split_mode (alias sm) to the llama.cpp backend options allowlist, accepting none|layer|row|tensor. The tensor value targets the experimental backend-agnostic tensor parallelism from ggml-org/llama.cpp#19378 and requires a llama.cpp build that includes that PR, FlashAttention enabled, KV-cache quantization disabled, and a manually set context size. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
E
Ettore Di Giacinto committed
21eace40ecc58a1dcd02f4cef4ecbcff0bf13480
Parent: 24505e5
Committed by GitHub <noreply@github.com>
on 4/25/2026, 12:02:57 PM