Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
Internal Refactor: Reroute Implementations (#21354)
* forward xla impl * forward logger implementation * forward logger implementation: mlflow * update neptune logger * forward kubeflow implementation * forward lsf env * move torchelastic * update xla env * forward bitsandbytes * forward deepspeed precision * forward transformer engine * forward XLA precision * forward deepspeed strategy fabric * integrate xla strategies * update pytorch deepspeed precision * forward trainer xla single device * XLA ddp trainer * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update fabric tests * fabric tests * tests * update version * update * update * update * update * update * update * fix doc issue * fix mypy issue * fix readthedocs and ci cpu tests * update * update * update * update * update * update * fix deepspeed assertion * update * fix transformer engine mock * update * logger mocks * add tpu mocks * update * update * update * update * fix docmake * update * update * fix loggers error * update * update * update * update * pin cuda version * update * try with removing libnccl downloading * undo cuda pinning * update * update * corretly handle model property * update error types and add property forwarding * update * update * update * meow meow * claymore!!! * remove todo * remove todos + version * retrigger-ci to fix ple release issue * fix mocks xla --------- Co-authored-by: Deependu Jha <[email protected]> Co-authored-by: Bhimraj Yadav <[email protected]>
J
Justus Schock committed
9a10959f255a3a1700da525114c1f1070fba5ded
Parent: 8ac4843
Committed by GitHub <[email protected]>
on 11/21/2025, 11:54:12 AM