Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
Multinode on MPS (#15748)
* Fix restarting attribute for lr finder * update lite executor * update trainer executor * update spawn executor * add multinode component tests * add testing helpers * add lite tests * add trainer tests * update changelog * update trainer * update workflow * update tests * debug * add reason for skipif * Apply suggestions from code review * switch skipif Co-authored-by: Jirka <jirka.borovec@seznam.cz> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
J
Justus Schock committed
36aecde6957ce0c6ebb4e4b8bf8c1738a4c253b6
Parent: 8475f85
Committed by GitHub <noreply@github.com>
on 12/8/2022, 12:05:18 PM