🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
DeepGEMM (#44832)
* deep gemm * standardize * clear deepgemm flow and blackwell optimization * avoid unnecessary gathers in grouped mm * assertions and drop the synced path * use lazy load kernel * style * add prefix and check for cuda runtime version * better names * exit on missing functions * comment about why we use deepspeed cutlass single gemm * global statements * add cuda check * fix deepgemm fastpath for models with bf16 scales * force fp32 scales in experts as well * Update src/transformers/integrations/finegrained_fp8.py Co-authored-by: Anton Vlasjuk <[email protected]> * Update src/transformers/integrations/hub_kernels.py Co-authored-by: Anton Vlasjuk <[email protected]> * fix * style --------- Co-authored-by: Anton Vlasjuk <[email protected]>
I
Ilyas Moutawwakil committed
bc576731d46cdf0936daf0833dc1a1bdd1b4898a
Parent: 81db7d3
Committed by GitHub <[email protected]>
on 3/31/2026, 3:04:02 PM