Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to implement model check pointing for best models

See original GitHub issue

Hi,

How can I checkpoint a model based on accuracy instead of saving checkpoint based on the number of iterations. Currently, I am using cfg.SOLVER.CHECKPOINT_PERIOD which uses number of iterations. What do I have to do if I need to save only the best weights instead of saving model after every 5000 iterations?

Issue Analytics

State:
Created 4 years ago
Reactions:4
Comments:6

Top GitHub Comments

1reaction

ppwwyyxxcommented, Oct 8, 2020

With plain_train_net.py, you can add this logic using the return value of do_test: https://github.com/facebookresearch/detectron2/blob/5e2a6f62ef752c8b8c700d2e58405e4bede3ddbe/tools/plain_train_net.py#L175-L182

Alternatively, it can also be implemented as a hook to be used as trainers, but it’s often more work.

0reactions

ppwwyyxxcommented, Feb 2, 2020

The model computes losses when in training mode. So model(validation_inputs) will give you losses.

By writing the model yourself you’ll be able to decide what it returns if you’d like anything not supported by the existing models.

Top Results From Across the Web

How to Checkpoint Deep Learning Models in Keras

A good use of checkpointing is to output the model weights each time an improvement is observed during training. The example below creates...

Model Checkpointing for DL - Analytics Vidhya

Improving your Deep Learning model using Model Checkpointing(Implementation)- Part 2 · 1. Loading the dataset · 2. Pre-processing the data · 3.

Checkpointing Deep Learning Models in Keras

In this article, you will learn how to checkpoint a deep learning model built using Keras and then reinstate the model architecture and ......

Checkpointing Tutorial for TensorFlow, Keras, and PyTorch

This post will demonstrate how to checkpoint your training models on FloydHub so that you can resume your experiments from these saved ...

Checkpointing Models — H2O 3.38.0.3 documentation

To resume model training, use checkpoint model keys ( model_id ) to incrementally train a specific model using more iterations, more data, different...