Support training restorageSee original GitHub issue
Subject of the feature
Currently, we are using
model.load_weights to start the supervised training from non-random weights.
However, if the intention is to resume a stopped run, this will not be perfect as the status of optimizer is not saved in the checkpoint.
The following piece of code can be helpful: https://github.com/tensorflow/tensorflow/issues/27861#issuecomment-487455939
- Created 3 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
what the status of optimizer is useful for resuming?
If you do not have the optimizer status, beta, gamma in Adam for example. After reloading the checkpoint, the first gradient update might give a big harmful update which leads to inferior performance.
With optimizer status, you can continue the training as you never stopped it. This piece of code has been tested in other projects, so it’s verified already.