SIGN IN SIGN UP

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

0 0 0 Python

🚨 fix + tests dense & MoE TP all reduce (decoder only) (#43722)

* introducing test tensor parallel mixing to catch TP related error

* Remove test file for tensor parallel functionality

* Refactor dense and MoE test scripts for parallel execution and improved GPU management

- Updated `run_dense_tests.sh` and `run_moe_tests.sh` to support parallel execution of tests using available GPU pairs.
- Changed variable names for clarity, replacing `NUM_GPUS` with `GPUS_PER_TEST`.
- Enhanced output messages to reflect the number of parallel test slots and GPU usage.
- Implemented logic to handle skipped tests and updated result reporting to include skipped counts.
- Removed `TensorParallelTesterMixin` from `CausalLMModelTest` and integrated it into `ModelTesterMixin` for better structure in test classes.

* restore

* add all reduce for ep

* fix init and bias sharding

* fix finalize weight init

* add full stacktracing

* fix

* add report to run tests

* okay big improvement here

* the only case shard index should be used is when we are acctually collecting for mergeModuleList

* more fixes

* fix EP forward gpt oss

* add test that trigger the weight converter or only dynamoc loading

* Update test scripts to use new tensor parallel test keyword

- Modified `run_dense_tests.sh` and `run_moe_tests.sh` to change the pytest keyword from "test_tensor_parallel" to "test_tp_" for improved test targeting.
- Cleaned up comments and removed unused code in `test_tensor_parallel_mixin.py` to streamline the testing process and enhance readability.

* cleaning + find_port + remove comments

* revert some shit

* when you are stupid sometimes you really need a brain :) :) :) :)

* fix TP

* Ok GPT oss is fixed now

* try to fix perms

* test only causal llm

* attempt to fix

* am I a doomer and AI is not that bad?

* fix

* it "passes" but the output is shit

* style my man

* outputs are gonna be giberish but at least the forward pass "works"

* dtyle

* fix mixtral

* okay shape fixes

* tensor idx is only for groupped gemm / EP

* fix gate_up shard

* fix :)

* revert some EP changes that are breaking other stuff

* style

* fix solar open tp

* trigger test on deepseek v3

* fix glm4_moe tp

* fix glm4 moe lite tensor parallel

* fix longcat and glm4_moe_lite by all reducing gradients of k_rot

* fix ernie4_5_moe

* fix qwen3 by all reduce grads of q_norm

* fix deepseek v3 tp (need a constant dropout other different RNG + all_reduce backward for K rotary)

* Rename ReplicatedInTP to ReplicatedWithGradAllReduce and update references in tensor_parallel.py

* fix minimax_m2

* fix deepseek v2 for TP

* fix minimax

* fix qwen3_next for TP

* fix dots1 tp

* fix flex_olmo TP

* fix qwen3 tp dense

* fix exaone4 tp

* fix gemma3 tp

* fix apterus TP

* fix seed_oss tp by setting 0 to dropout

* fix gemma3n for TP

* dropout set to 0 for test + gradient slicing depending on fused weights or not

* make fixup + glm4 important fix on tp plan to avoid assigning wrong TP plan

* linting

* remove shell scripts

* make test tensor parallel triggering the CI

* fix ci

* fix ci

* mark it as ep_plan

* add @require_torch_multi_accelerator

* fix CI

* undo pr merge tensor parallel

* revert core model loading file

* revert modeling_utils file

* small fix in modeling_utils

* Update tensor parallel test configurations to enable tests by default and standardize seed values for reproducibility.

* linting

* Reorganize imports in modeling_utils.py to maintain consistency

* fix qwen3_5_moe tp

* fix glm moe dsa tp

* fix qwen3_5 tp

* Add training_overfit_steps parameter to Gemma3nTextModelTest

* fix 16 bits alignment

* Add WeightConverter for gate_up_proj and down_proj with 16 bytes alignment in checkpoint mapping

* Add solar_open mapping with WeightConverter for gate_up_proj and down_proj, ensuring 16 bytes alignment

* Update hub metadata (#43892)

* update

* reorder

* Add MlaKvAProjParallel layer for MLA attention and update TP plans

- Introduced MlaKvAProjParallel class to handle kv_a_proj_with_mqa in tensor parallelism.
- Updated prepare_module_tp methods to accept model parameter for better integration.
- Adjusted base_model_tp_plan in various configurations to include mla_kv_a_proj.
- Removed redundant all_reduce_backward calls from DeepseekV2 and DeepseekV3 attention implementations.

* fix doc

* force 16 Bytes Alignment

* fix slice tensor

* more doc

* better abstraction for zero experts

* linting

* refactor

* redudancy in tests

* simplify

* revert

* fix gemma2

* fix

* make tests work only on CPU

* linting

* skip tests for run_slow

* cleaning

* cleaning

* enhance doc on dynamic weight loading

* add config instead of model for tp

* more doc to tensor parallel for MlaKvAProjParallel

* use -1 instead of self.num_heads, this way when TP is used, it can infer the local_num_heads size

* fix modular glm_moe_dsa

* collect all gradient failure tests before stopping at first one

* generate more max new tokens for tensor parallel test as models are smalls

Co-authored-by: Arthur <[email protected]>

* compare generated tokens for tensor parallel tests

* use attr config as much as possible

* add TP + quantized tests

* raise error if attr does not exist to say add it to the auto mapping

* update doc

* install torchao for tp + quantization tests

* update doc

* update doc

* update doc

* update doc

* udapte doc

* update doc

* partially fix tp + quantization generation

* partially fix  tp + quantize

* skipping some tp + quantized test for now

* guard torchao import for test_training_ci

* Update src/transformers/models/longcat_flash/modular_longcat_flash.py

Co-authored-by: Arthur <[email protected]>

* move file

* fix linting

* fix linting

* fix port conflict in test

---------

Co-authored-by: Arthur Zucker <[email protected]>
Co-authored-by: Raushan Turganbay <[email protected]>
Co-authored-by: Arthur <[email protected]>
F
Ferdinand Mom committed
f49c720f52a08ad68f9f1d299cf65e7125d2e359
Parent: 5c1c72b
Committed by GitHub <[email protected]> on 3/4/2026, 8:57:50 AM