A high-throughput and memory-efficient inference and serving engine for LLMs
[Bugfix][MoE] Fix 6-8% decode regression: prefer multi-stream shared expert overlap (#38990)
Signed-off-by: Martin Vit <martin@voipmonitor.org> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
M
Martin Vit committed
228023b3a58f78ed028cb4e5fb4e078bb1574262
Parent: 9a52826
Committed by GitHub <noreply@github.com>
on 4/5/2026, 2:28:31 PM