A high-throughput and memory-efficient inference and serving engine for LLMs
amd blackwell cuda deepseek deepseek-v3 gpt gpt-oss inference kimi llama llm llm-serving model-serving moe openai pytorch qwen qwen3 tpu transformer
bhargav-patel-29 [Model] Add support for BharatGen's Param2MoE model (#38000)