SIGN IN SIGN UP

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

0 0 0 Python

:rotating_light: [`FA4`] Initial support (#42435)

* initial implementation

* CB support

* change how we call item on max_seq_len_q/k

* fix

* tests

* fix fa2 clash

* unify the fa dispatch

* fix

* modernbert...

* oops

* parity test

* style

* nit

* fixup imports for fa4

* enable attention sinks, fixup logits checks in parity test

* style

* change dispatch logic and introduce lower bound for FA

* style

* fix test

* min fa2, avoid 2x device sync

* style

* simple min version instead of list

* fixup error message on non init check

* fixup up non init check a tad more

* refactor some FA constants out to main fa utils

* new marker for all fas needed

* oops

* style and make the fa kernel fallback generalized

* default none...

* more refactors

* style

* fix

* this test faulty even on main, xformers can handle any shape apparently yikes

* lets make this more robust, we should check for none within...

* fix

* oops
A
Anton Vlasjuk committed
65db6fc07c776406f7b4afe1ee5ecdbeb7202af7
Parent: 39f1c8d
Committed by GitHub <[email protected]> on 3/13/2026, 7:19:37 PM