Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
Add `save_on_exception` option to `ModelCheckpoint` (#20916)
* add saving of checkpoint if an exception is raised * import callback to checkpoint test file * add test for exception in training callbacks --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Jirka B <[email protected]>
V
vsey committed
6f93a90d49df5df1d1b04e88b1c5a89c334d3d5e
Parent: 577c04d
Committed by GitHub <[email protected]>
on 8/12/2025, 5:57:01 PM