Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
[feat] Allow overriding optimizer_zero_grad and/or optimizer_step when using accumulate_grad_batches (#7980)
D
David Chan committed
c6e02e481eebaa48eda3877ab79a749e8635c500
Parent: eebdc91
Committed by GitHub <[email protected]>
on 6/17/2021, 10:50:37 AM