set global_step from checkpoint
See original GitHub issue❓ Questions/Help/Support
Hello, I wonder how am I suppose to set the global_step from checkpoint for handlers?
Specifically, i would like to have most of the handlers in tensorboard logger using engine.state.iteration as global step. Since engine.state.iteration is reset when i do engine.run. I had to create my own counter (like the following), which basically add an offset to current iteration, and pass it to all the global_step_transform arguments in output handler.
class global_step_counter:
def __init__(self, resume_from_iter):
self.resume_from_iter = resume_from_iter
return
def set_starting_val(self, new_val):
self.resume_from_iter = new_val
return
def __call__(self, engine):
return engine.state.iteration + self.resume_from_iter
However, i realize the following handlers does not provide global_step_transform option
- OptimizerParamsHandler
- WeightsScalarHandler
- GradsScalarHandler
- WeightsHistHandler
- GradsHistHandler Instead, they use get_event_attrib_value to get the global_step, which would make this variable inconsistent with others
Thank you
Issue Analytics
- State:
- Created 4 years ago
- Comments:7
Top Results From Across the Web
How to get the global_step when restoring checkpoints in ...
General pattern is to have a global_step variable to keep track of steps global_step = tf.Variable(0, name='global_step', trainable=False) ...
Read more >Global step loaded by "load_from_check_point" is wrong
global step loaded by “load_from_check_point” is “0” whereas “torch.load” gives the correct global step… I'm not quite sure if it is a bug ......
Read more >Model Checkpointing — DeepSpeed 0.8.0 documentation
Checkpoint tag used as a unique identifier for the checkpoint, global step is. used if not provided. Tag name must be the same...
Read more >tf.train.Checkpoint | TensorFlow v2.11.0
It can be more robust to changes in the Python program, and helps to support restore-on-create for variables. Checkpoint objects have ...
Read more >ignite.handlers — PyTorch-Ignite v0.4.1 Documentation
Checkpoint handler can be used to periodically save and load objects which have ... Default is None, global_step based on attached engine.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thank you @vfdev-5 ! this solved my issue
Here is a minimal example @vfdev-5
output
You can see the iteration is indeed reset