question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`best_model_path` does not retrieve the path to the best monitor checkpoint file

See original GitHub issue

🐛 Bug

If there are more than one ModelCheckpoint, and the first one in callback list does NOT include monitor, the self.checkpoint_callback.best_model_path will be wrong (It is not best monitor). e.g.

callbacks = []
val_ckpt_callback = pl.callbacks.ModelCheckpoint(
    filename="val_end-{epoch}-{step}-{val_loss:.4f}-{val_ppl:.4f}",
    save_top_k=-1,
    every_n_epochs=1
)
callbacks.append(val_ckpt_callback)
monitor_ckpt_callback = pl.callbacks.ModelCheckpoint(
    filename="monitor-{epoch}-{step}-{" + my_monitor + ":.4f}",
    monitor=my_monitor,
    save_top_k=1
)
callbacks.append(monitor_ckpt_callback)

Related code: https://github.com/PyTorchLightning/pytorch-lightning/blob/b2e98d61661fca80b87e1e2b49cd301d29667ce5/pytorch_lightning/trainer/trainer.py#L2342-L2353

To Reproduce

Expected behavior

Always save best monitor model checkpoint.

Environment

  • CUDA: - GPU: - A100-SXM4-40GB - A100-SXM4-40GB - A100-SXM4-40GB - A100-SXM4-40GB - A100-SXM4-40GB - A100-SXM4-40GB - A100-SXM4-40GB - A100-SXM4-40GB - available: True - version: 11.3
  • Packages: - numpy: 1.20.1 - pyTorch_debug: False - pyTorch_version: 1.10.2+cu113 - pytorch-lightning: 1.5.10 - tqdm: 4.63.0
  • System: - OS: Linux - architecture: - 64bit - - processor: x86_64 - python: 3.7.10 - version: #1 SMP Fri Mar 19 10:07:22 CST 2021

cc @carmocca @awaelchli @ninginthecloud @jjenniferdai @rohitgr7

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
carmoccacommented, Mar 28, 2022

if the monitor is None, why do we need to save the best_model_path?

As I said in my previous message: “so that passing ckpt_path=‘best’ still works for them”

People want to be able to pass ckpt_path='best' regardless of their monitor config. In this case, it would equal the last checkpoint saved.

This behavior could be changed if we have #11912

1reaction
rohitgr7commented, Mar 28, 2022

well, yeah that’s true or you can pass the checkpoint path directly.

trainer.test(..., ckpt_path=checkpoint_callback2.best_model_path)

ckpt='best' selects the first checkpoint callback and extracts the best model path from it. It was kept like this to have a quick handy feature for users since, in the majority of the cases, there’s usually one checkpoint callback.

maybe we could extend it a little and if best is selected, we can raise warnings/errors in such a case if multiple model checkpoint callbacks are configured.

cc @carmocca wdyt?

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to get the checkpoint path? - Trainer - PyTorch Lightning
Hi could you tell me how I can get the checkpoint path in this ... if save_top_k is None and monitor is not...
Read more >
ModelCheckpoint - PyTorch Lightning - Read the Docs
After training finishes, use best_model_path to retrieve the path to the best checkpoint file and best_model_score to retrieve its score. Parameters.
Read more >
ModelCheckpoint - Keras
ModelCheckpoint callback is used in conjunction with training using model.fit() to save a model or weights (in a checkpoint file) at some interval, ......
Read more >
R80.20-R80.30 ClusterXL vlan monitoring - Check Point ...
Any other way to monitor all vlan then ? Can someone help ? Thank you. Best regards;. Furil.
Read more >
Getting error with Pytorch lightning when passing model ...
checkpoint_callback = ModelCheckpoint( dirpath="checkpoints", filename="best-checkpoint", save_top_k=1, verbose=True, monitor="val_loss", mode=" ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found