Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
docs: update `optimizer_zero_grad` order, and the backward pass. (#21144)
G
GdoongMathew committed
06bed20190c2e428c00ef73c6aa70ab423b2a47a
Parent: da7f2f9
Committed by GitHub <[email protected]>
on 9/2/2025, 7:57:23 AM