Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ModelCheckpoint score_function confusing to use

See original GitHub issue

I use create_supervised_trainer in combination with ModelCheckpoint.

checkpoint = ModelCheckpoint(
    f"./models/{start_time}",
    "resnet_50",
    score_name="loss",
    score_function=lambda engine: 1 - engine.state.output, # output contains the minibatch output
    n_saved=10,
    create_dir=True,
)

With supervised trainer in not straight forward to add a metric. So users only have access to the state.output, which only contains the minibatch. This causes models to be saved that based on the model performance on the last minibatch, not the whole epoch.

It might be a good idea to add a warning to the description of Modelcheckpoint that output contains the minibach output only. And/or add an example that shows how to use it in combination with a metric.

Issue Analytics

State:
Created 4 years ago
Comments:5

Top GitHub Comments

1reaction

vfdev-5commented, Nov 29, 2019

@oteph thanks for the feedback! Could you please explain what would you like to achieve using ModelCheckpoint on the trainer ?

In general, ModelCheckpoint is used to create training checkpoints and save best models according to validation scores.

Create training checkpoints

checkpoint_handler = ModelCheckpoint(dirname=output_path,
                                             filename_prefix="checkpoint")

trainer.add_event_handler(Events.ITERATION_COMPLETED(every=1000),
                                  checkpoint_handler,
                                  {'model': model, 'optimizer': optimizer})

above code is working with master. In stable version you can use save_interval

        checkpoint_handler = ModelCheckpoint(dirname=output_path,
                                             filename_prefix="checkpoint",
                                             save_interval=1000)
        trainer.add_event_handler(Events.ITERATION_COMPLETED,
                                  checkpoint_handler,
                                  {'model': model, 'optimizer': optimizer})

Save best models

        def default_score_fn(engine):
            score = engine.state.metrics['accuracy']
            return score

        best_model_handler = ModelCheckpoint(dirname=output_path,
                                             filename_prefix="best",
                                             n_saved=3,
                                             score_name="val_accuracy",
                                             score_function=default_score_fn)
        evaluator.add_event_handler(Events.COMPLETED, best_model_handler, {'model': model, })

With supervised trainer in not straight forward to add a metric. So users only have access to the state.output, which only contains the minibatch. This causes models to be saved that based on the model performance on the last minibatch, not the whole epoch.

I assume that you are also aware of the fact that during the training, the model is changing and metrics computed during the training do not represent the the last performance of the model. To add a running average metrics with ignite is simple, see https://pytorch.org/ignite/v0.2.1/metrics.html#ignite.metrics.RunningAverage With this class you can trace running average of the loss and use the score function based on this metric.

Otherwise, as score function requires only engine, you can store your custom score in engine.state and provide it as score

def score_function(engine):
    return engine.state.my_score

HTH

0reactions

vfdev-5commented, Nov 29, 2019

@oteph hope to make it more clear. Also, do not hesitate to look at other example scripts and notebooks to see how ignite helps to get the code more flexible and refactored.

Feel free to close the issues if they are answered 😃

Top Results From Across the Web

ModelCheckpoint - Keras

ModelCheckpoint callback is used in conjunction with training using model.fit() to save a model or weights (in a checkpoint file) at some interval, ......

Use Keras Deep Learning Models with Scikit-Learn in Python

In this post, you will discover how you can use deep learning models from ... I am kind of confused, since the output...

tf.keras.callbacks.ModelCheckpoint vs tf.train.Checkpoint

I took a quick look at Keras's implementation of ModelCheckpoint ... I also had a hard time differentiating between the checkpoint objects ...

GENIE: Higher-Order Denoising Diffusion Solvers - OpenReview

A crucial drawback of DDMs is that the generative ODE or SDE is typically difficult to solve, due to the complex score function....

Classification of crystal structure using a convolutional neural ...

The auto-peak-search was first carried out using proper peak profile ... The whole network still expresses a single differentiable score function: from.