A high-throughput and memory-efficient inference and serving engine for LLMs
COMMITS
20
in the last week
CONTRIBUTORS
17
active
STARS
0
total
FORKS
0
total
TOP CONTRIBUTORS
A
Andreas Karatzas G
Greg Pereira R
Robert Shaw B
bhargav-patel-29 L
liuchenbing2026 M
Micah Williamson N
Netanel Haber K
Kevin H. Luu W
Wei Zhao M
Martin Vit RECENT COMMITS
B
[Model] Add support for BharatGen's Param2MoE model (#38000)
bhargav-patel-29
L
MiniMax-M2: add Eagle3 speculative decoding support (#37512)
liuchenbing2026
A
M
N
K
[ci] Switch some CI jobs to H200 MIG slices (#38956)
Kevin H. Luu
G
M
R
Revert "[vLLM IR] gemma_rms_norm" (#38998)
Robert Shaw
X
[vLLM IR] gemma_rms_norm (#38780)
Xiaoshuang Wang
A
[Perf][GDN] Align TMA usage with upstream FLA (#38981)
Artem Perevedentsev
Y
[Bugfix] Fix DSV32 weight loading (#38870)
Yongye Zhu