🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Add afmoe model (#42168)
* Add AFMoE model support * Address review feedback for AFMoE implementation * Add flex attention support to AFMoE model * Fix expert_bias routing in AFMoE * Remove test-results directory * Address PR review feedback for AFMoE model * fix(afmoe): ensure RMSNorm output dtype matches input dtype) * properly return attn weights * fix most tests * cleanup Remove shared expert if else as defaults to 2 Remove `route_norm` as it default to `True`. Make test smaller faster * fix input embeds api * update rope API, smaller test and should be good to go * oups wront place to skip unittest * quality * update * rope parameter docstring fill --------- Co-authored-by: Arthur <[email protected]> Co-authored-by: Arthur <[email protected]>
R
Raghav Ravishankar committed
cac0a28c83cf87b7a05495de3177099c635ba852
Parent: 2a61590
Committed by GitHub <[email protected]>
on 11/29/2025, 11:20:04 AM