SIGN IN SIGN UP

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

158577 0 0 Python

Add afmoe model (#42168)

* Add AFMoE model support

* Address review feedback for AFMoE implementation

* Add flex attention support to AFMoE model

* Fix expert_bias routing in AFMoE

* Remove test-results directory

* Address PR review feedback for AFMoE model

* fix(afmoe): ensure RMSNorm output dtype matches input dtype)

* properly return attn weights

* fix most tests

* cleanup
Remove shared expert if else as defaults to 2
Remove `route_norm` as it default to `True`.

Make test smaller faster

* fix input embeds api

* update rope API, smaller test and should be good to go

* oups wront place to skip unittest

* quality

* update

* rope parameter docstring fill

---------

Co-authored-by: Arthur <[email protected]>
Co-authored-by: Arthur <[email protected]>
R
Raghav Ravishankar committed
cac0a28c83cf87b7a05495de3177099c635ba852
Parent: 2a61590
Committed by GitHub <[email protected]> on 11/29/2025, 11:20:04 AM