SIGN IN SIGN UP

Multiple updates and refactorings (#347)

* Optimize FP8/FP4 Mega MoE dispatch, scheduling, and shared memory layout
* Add BF16 accumulation/output support updates for GEMM paths
* Improve paged MQA scheduler and attention coverage
* Minor fixes and test updates
R
Ray Wang committed
88965b078186ee7510ab9fc4f1d5ebc19adfa8d1
Parent: 714dd1a
Committed by GitHub <noreply@github.com> on 6/1/2026, 9:11:18 AM