Making large AI models cheaper, faster and more accessible
[shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540)
* implement sharded optimizer saving * add more param info * finish implementation of sharded optimizer saving * fix bugs in optimizer sharded saving * add pp+zero test * param group loading * greedy loading of optimizer * fix bug when loading * implement optimizer sharded saving * add optimizer test & arrange checkpointIO utils * fix gemini sharding state_dict * add verbose option * add loading of master params * fix typehint * fix master/working mapping in fp16 amp
B
Baizhou Zhang committed
c9625dbb6364c10f21828b30bc58e8fbcf22a900
Parent: 2c787d7
Committed by GitHub <noreply@github.com>
on 8/31/2023, 6:50:47 AM