`best_model_path` does not retrieve the path to the best monitor checkpoint file
See original GitHub issue🐛 Bug
If there are more than one ModelCheckpoint
, and the first one in callback list does NOT include monitor
, the self.checkpoint_callback.best_model_path
will be wrong (It is not best monitor).
e.g.
callbacks = []
val_ckpt_callback = pl.callbacks.ModelCheckpoint(
filename="val_end-{epoch}-{step}-{val_loss:.4f}-{val_ppl:.4f}",
save_top_k=-1,
every_n_epochs=1
)
callbacks.append(val_ckpt_callback)
monitor_ckpt_callback = pl.callbacks.ModelCheckpoint(
filename="monitor-{epoch}-{step}-{" + my_monitor + ":.4f}",
monitor=my_monitor,
save_top_k=1
)
callbacks.append(monitor_ckpt_callback)
To Reproduce
Expected behavior
Always save best monitor model checkpoint.
Environment
- CUDA: - GPU: - A100-SXM4-40GB - A100-SXM4-40GB - A100-SXM4-40GB - A100-SXM4-40GB - A100-SXM4-40GB - A100-SXM4-40GB - A100-SXM4-40GB - A100-SXM4-40GB - available: True - version: 11.3
- Packages: - numpy: 1.20.1 - pyTorch_debug: False - pyTorch_version: 1.10.2+cu113 - pytorch-lightning: 1.5.10 - tqdm: 4.63.0
- System: - OS: Linux - architecture: - 64bit - - processor: x86_64 - python: 3.7.10 - version: #1 SMP Fri Mar 19 10:07:22 CST 2021
cc @carmocca @awaelchli @ninginthecloud @jjenniferdai @rohitgr7
Issue Analytics
- State:
- Created a year ago
- Comments:8 (6 by maintainers)
Top Results From Across the Web
How to get the checkpoint path? - Trainer - PyTorch Lightning
Hi could you tell me how I can get the checkpoint path in this ... if save_top_k is None and monitor is not...
Read more >ModelCheckpoint - PyTorch Lightning - Read the Docs
After training finishes, use best_model_path to retrieve the path to the best checkpoint file and best_model_score to retrieve its score. Parameters.
Read more >ModelCheckpoint - Keras
ModelCheckpoint callback is used in conjunction with training using model.fit() to save a model or weights (in a checkpoint file) at some interval, ......
Read more >R80.20-R80.30 ClusterXL vlan monitoring - Check Point ...
Any other way to monitor all vlan then ? Can someone help ? Thank you. Best regards;. Furil.
Read more >Getting error with Pytorch lightning when passing model ...
checkpoint_callback = ModelCheckpoint( dirpath="checkpoints", filename="best-checkpoint", save_top_k=1, verbose=True, monitor="val_loss", mode=" ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
As I said in my previous message: “so that passing ckpt_path=‘best’ still works for them”
People want to be able to pass
ckpt_path='best'
regardless of their monitor config. In this case, it would equal the last checkpoint saved.This behavior could be changed if we have #11912
well, yeah that’s true or you can pass the checkpoint path directly.
ckpt='best'
selects the first checkpoint callback and extracts the best model path from it. It was kept like this to have a quick handy feature for users since, in the majority of the cases, there’s usually one checkpoint callback.maybe we could extend it a little and if
best
is selected, we can raise warnings/errors in such a case if multiple model checkpoint callbacks are configured.cc @carmocca wdyt?