[Train] Support custom checkpoint file names
See original GitHub issueCurrently, the file names for Ray Train checkpoints are not customizable. They will always be of the format checkpoint_XXX
.
Provide a way for the user to specify the name of the checkpoint file that they save.
One possible API is to allow the user to specify the checkpoint file name in train.save_checkpoint()
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Training checkpoints | TensorFlow Core
TensorFlow matches variables to checkpointed values by traversing a directed graph with named edges, starting from the object being loaded. Edge names typically ......
Read more >Saving Checkpoints during Training - PyKEEN - Read the Docs
Here we have defined a pipeline that will save training loop checkpoints in the checkpoint file called my_checkpoint.pt every time an epoch finishes...
Read more >Checkpoints | Data Version Control · DVC
The checkpoint file, specified with --model 'model.pt' , is an output from one checkpoint that becomes a dependency for the next checkpoint. The...
Read more >A Guide To Using Checkpoints — Ray 2.2.0
This topic is relevant to trial checkpoints. Tune stores checkpoints on the node where the trials are executed. If you are training on...
Read more >Checkpointing - Composer - MosaicML
To customize the filenames of checkpoints inside save_folder , you can set the save_filename argument. By default, checkpoints will be named like ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks @andrijazz for providing this context, I apologize if my comment came off as trying to restrict users!
The reason I asked was because with the current API, even with the ability to customize individual checkpoint names there may be some confusion since they will be written to the
<logdir>/run_<run_id>/checkpoints
directory, which may change over runs. Perhaps we need to allow customization of this directory name as well…Being able to specify custom path and name of the checkpoints would be great.
One other use-case that comes to mind is that user might want to store checkpoints outside of ray generated folders … for example wandb creates its own dirs and automatically uploads all files stored in those dirs to the cloud after run is finished. User might want to store checkpoints inside wandb dir because he can easliy browse through them on the wandb web app and decide which one to use based on the plots.