Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`best_model_path` does not retrieve the path to the best monitor checkpoint file

See original GitHub issue

🐛 Bug

If there are more than one ModelCheckpoint, and the first one in callback list does NOT include monitor, the self.checkpoint_callback.best_model_path will be wrong (It is not best monitor). e.g.

callbacks = []
val_ckpt_callback = pl.callbacks.ModelCheckpoint(
    filename="val_end-{epoch}-{step}-{val_loss:.4f}-{val_ppl:.4f}",
    save_top_k=-1,
    every_n_epochs=1
)
callbacks.append(val_ckpt_callback)
monitor_ckpt_callback = pl.callbacks.ModelCheckpoint(
    filename="monitor-{epoch}-{step}-{" + my_monitor + ":.4f}",
    monitor=my_monitor,
    save_top_k=1
)
callbacks.append(monitor_ckpt_callback)

To Reproduce

Expected behavior

Always save best monitor model checkpoint.

Environment

CUDA: - GPU: - A100-SXM4-40GB - A100-SXM4-40GB - A100-SXM4-40GB - A100-SXM4-40GB - A100-SXM4-40GB - A100-SXM4-40GB - A100-SXM4-40GB - A100-SXM4-40GB - available: True - version: 11.3
Packages: - numpy: 1.20.1 - pyTorch_debug: False - pyTorch_version: 1.10.2+cu113 - pytorch-lightning: 1.5.10 - tqdm: 4.63.0
System: - OS: Linux - architecture: - 64bit - - processor: x86_64 - python: 3.7.10 - version: #1 SMP Fri Mar 19 10:07:22 CST 2021

cc @carmocca @awaelchli @ninginthecloud @jjenniferdai @rohitgr7

Issue Analytics

State:
Created a year ago
Comments:8 (6 by maintainers)

Top GitHub Comments

2reactions

carmoccacommented, Mar 28, 2022

if the monitor is None, why do we need to save the best_model_path?

As I said in my previous message: “so that passing ckpt_path=‘best’ still works for them”

People want to be able to pass ckpt_path='best' regardless of their monitor config. In this case, it would equal the last checkpoint saved.

This behavior could be changed if we have #11912

1reaction

rohitgr7commented, Mar 28, 2022

well, yeah that’s true or you can pass the checkpoint path directly.

trainer.test(..., ckpt_path=checkpoint_callback2.best_model_path)

ckpt='best' selects the first checkpoint callback and extracts the best model path from it. It was kept like this to have a quick handy feature for users since, in the majority of the cases, there’s usually one checkpoint callback.

maybe we could extend it a little and if best is selected, we can raise warnings/errors in such a case if multiple model checkpoint callbacks are configured.

cc @carmocca wdyt?

Top Results From Across the Web

How to get the checkpoint path? - Trainer - PyTorch Lightning

Hi could you tell me how I can get the checkpoint path in this ... if save_top_k is None and monitor is not...

ModelCheckpoint - PyTorch Lightning - Read the Docs

After training finishes, use best_model_path to retrieve the path to the best checkpoint file and best_model_score to retrieve its score. Parameters.

ModelCheckpoint - Keras

ModelCheckpoint callback is used in conjunction with training using model.fit() to save a model or weights (in a checkpoint file) at some interval, ......

R80.20-R80.30 ClusterXL vlan monitoring - Check Point ...

Any other way to monitor all vlan then ? Can someone help ? Thank you. Best regards;. Furil.

Getting error with Pytorch lightning when passing model ...

checkpoint_callback = ModelCheckpoint( dirpath="checkpoints", filename="best-checkpoint", save_top_k=1, verbose=True, monitor="val_loss", mode=" ...