Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
StreamingDataLoader: Resolve fault tolerance with the CombinedStreamingDataset and multiple workers (#19326)
T
thomas chaton committed
ed367ca675861cdf40dbad2e4d66f7eee2ec50af
Parent: e1a6dd9
Committed by GitHub <[email protected]>
on 1/23/2024, 5:54:10 PM