Making large AI models cheaper, faster and more accessible
[hotfix] moe hybrid parallelism benchmark & follow-up fix (#6048)
* [example] pass use_fp8_comm flag to all plugins * [example] add mixtral benchmark * [moe] refine assertion and check * [moe] fix mixtral & add more tests * [moe] consider checking dp * sp group and moe_dp_group * [mixtral] remove gate tp & add more tests * [deepseek] fix tp & sp for deepseek * [mixtral] minor fix * [deepseek] add deepseek benchmark
B
botbw committed
c54c4fcd15b70830e2efe00df0a6087b9ce5f6b1
Parent: 8fd25d6
Committed by GitHub <noreply@github.com>
on 9/10/2024, 9:30:53 AM