🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
:rotating_light: [`FA4`] Initial support (#42435)
* initial implementation * CB support * change how we call item on max_seq_len_q/k * fix * tests * fix fa2 clash * unify the fa dispatch * fix * modernbert... * oops * parity test * style * nit * fixup imports for fa4 * enable attention sinks, fixup logits checks in parity test * style * change dispatch logic and introduce lower bound for FA * style * fix test * min fa2, avoid 2x device sync * style * simple min version instead of list * fixup error message on non init check * fixup up non init check a tad more * refactor some FA constants out to main fa utils * new marker for all fas needed * oops * style and make the fa kernel fallback generalized * default none... * more refactors * style * fix * this test faulty even on main, xformers can handle any shape apparently yikes * lets make this more robust, we should check for none within... * fix * oops
A
Anton Vlasjuk committed
65db6fc07c776406f7b4afe1ee5ecdbeb7202af7
Parent: 39f1c8d
Committed by GitHub <[email protected]>
on 3/13/2026, 7:19:37 PM