Multiple updates and refactorings (#347)
* Optimize FP8/FP4 Mega MoE dispatch, scheduling, and shared memory layout * Add BF16 accumulation/output support updates for GEMM paths * Improve paged MQA scheduler and attention coverage * Minor fixes and test updates
R
Ray Wang committed
88965b078186ee7510ab9fc4f1d5ebc19adfa8d1
Parent: 714dd1a
Committed by GitHub <noreply@github.com>
on 6/1/2026, 9:11:18 AM