A high-throughput and memory-efficient inference and serving engine for LLMs
COMMITS
20
in the last week
CONTRIBUTORS
18
active
STARS
0
total
FORKS
0
total
TOP CONTRIBUTORS
O
omerpaz95 Z
z1ying N
Netanel Haber T
TJian F
Flora Feng L
Luciano Martins M
mysterious hhhh D
Dan Alistarh Y
Yusuf Mohammad J
Jee Jee Li RECENT COMMITS
N
Optimize nemotron VL image/video preprocessing (#40283)
Netanel Haber
F
O
O
D
Y
Added general ND x ND matmul and unit test for it (#39909)
Yusuf Mohammad
J
[DOC] Add fuse_minimax_qk_norm (#39782)
Jee Jee Li
C
[ZenCPU] AMD Zen CPU Backend with supported dtypes via zentorch weekly (#39967)
Chinmay-Kulkarni-AMD
R
[Bugfix] Fix k_proj's bias for GLM-ASR (#40160)
Rishapveer Singh
N
M
[CI] Speed up test_fused_marlin_moe (#40178)
Michael Goin
X
[XPU]fake impl for xpu fp8_gemm (#39984)
Xinyu Chen