Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
Fix double iteration bug when resumed from a checkpoint. (#20775)
* Fix double iteration bug when resumed from a checkpoint. * Apply suggestions from code review * update wording in the comments. Signed-off-by: sudipto baral <[email protected]> * update test Signed-off-by: sudipto baral <[email protected]> * Add independent flag to track checkpoint resumption. Signed-off-by: sudipto baral <[email protected]> * lint Signed-off-by: sudipto baral <[email protected]> * update * Update src/lightning/pytorch/loops/training_epoch_loop.py Co-authored-by: Copilot <[email protected]> * Update .github/workflows/ci-tests-pytorch.yml * update * skip --------- Signed-off-by: sudipto baral <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Bhimraj Yadav <[email protected]> Co-authored-by: Deependu <[email protected]> Co-authored-by: Copilot <[email protected]>
S
Sudipto Baral committed
25b1343f1c112f6f8a87e07b336c1899c4065761
Parent: fb2e8d3
Committed by GitHub <[email protected]>
on 8/5/2025, 12:24:24 AM