SIGN IN SIGN UP

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

158577 0 0 Python

DeepGEMM (#44832)

* deep gemm

* standardize

* clear deepgemm flow and blackwell optimization

* avoid unnecessary gathers in grouped mm

* assertions and drop the synced path

* use lazy load kernel

* style

* add prefix and check for cuda runtime version

* better names

* exit on missing functions

* comment about why we use deepspeed cutlass single gemm

* global statements

* add cuda check

* fix deepgemm fastpath for models with bf16 scales

* force fp32 scales in experts as well

* Update src/transformers/integrations/finegrained_fp8.py

Co-authored-by: Anton Vlasjuk <[email protected]>

* Update src/transformers/integrations/hub_kernels.py

Co-authored-by: Anton Vlasjuk <[email protected]>

* fix

* style

---------

Co-authored-by: Anton Vlasjuk <[email protected]>
I
Ilyas Moutawwakil committed
bc576731d46cdf0936daf0833dc1a1bdd1b4898a
Parent: 81db7d3
Committed by GitHub <[email protected]> on 3/31/2026, 3:04:02 PM