SIGN IN SIGN UP
tensorflow / models UNCLAIMED

Models and examples built with TensorFlow

0 0 0 Python

Support async checkpoint in Orbit trainer/controller.

This CL adds a field in Orbit trainer/controller indicating whether async checkpoint is enabled for checkpoint saving. BY default this value is set to False, which is equivalent to the existing behavior.

In addition, a sync barrier is added at the end of training (in controller) to make sure users code won't prematurely access the checkpoint file/state when the async checkpoint saving is still ongoing.

PiperOrigin-RevId: 529300903
A
A. Unique TensorFlower committed
2b4fe39d40eaf7cb84c5f98a75e9212b33525e86
Parent: 6b2ed0d