SIGN IN SIGN UP
vllm-project / vllm UNCLAIMED

A high-throughput and memory-efficient inference and serving engine for LLMs

0 0 0 Python

[Bugfix][MoE] Fix 6-8% decode regression: prefer multi-stream shared expert overlap (#38990)

Signed-off-by: Martin Vit <martin@voipmonitor.org>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
M
Martin Vit committed
228023b3a58f78ed028cb4e5fb4e078bb1574262
Parent: 9a52826
Committed by GitHub <noreply@github.com> on 4/5/2026, 2:28:31 PM