question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ModelCheckpoint score_function confusing to use

See original GitHub issue

I use create_supervised_trainer in combination with ModelCheckpoint.

checkpoint = ModelCheckpoint(
    f"./models/{start_time}",
    "resnet_50",
    score_name="loss",
    score_function=lambda engine: 1 - engine.state.output, # output contains the minibatch output
    n_saved=10,
    create_dir=True,
)

With supervised trainer in not straight forward to add a metric. So users only have access to the state.output, which only contains the minibatch. This causes models to be saved that based on the model performance on the last minibatch, not the whole epoch.

It might be a good idea to add a warning to the description of Modelcheckpoint that output contains the minibach output only. And/or add an example that shows how to use it in combination with a metric.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
vfdev-5commented, Nov 29, 2019

@oteph thanks for the feedback! Could you please explain what would you like to achieve using ModelCheckpoint on the trainer ?

In general, ModelCheckpoint is used to create training checkpoints and save best models according to validation scores.

  1. Create training checkpoints
checkpoint_handler = ModelCheckpoint(dirname=output_path,
                                             filename_prefix="checkpoint")

trainer.add_event_handler(Events.ITERATION_COMPLETED(every=1000),
                                  checkpoint_handler,
                                  {'model': model, 'optimizer': optimizer})

above code is working with master. In stable version you can use save_interval

        checkpoint_handler = ModelCheckpoint(dirname=output_path,
                                             filename_prefix="checkpoint",
                                             save_interval=1000)
        trainer.add_event_handler(Events.ITERATION_COMPLETED,
                                  checkpoint_handler,
                                  {'model': model, 'optimizer': optimizer})

  1. Save best models
        def default_score_fn(engine):
            score = engine.state.metrics['accuracy']
            return score

        best_model_handler = ModelCheckpoint(dirname=output_path,
                                             filename_prefix="best",
                                             n_saved=3,
                                             score_name="val_accuracy",
                                             score_function=default_score_fn)
        evaluator.add_event_handler(Events.COMPLETED, best_model_handler, {'model': model, })

With supervised trainer in not straight forward to add a metric. So users only have access to the state.output, which only contains the minibatch. This causes models to be saved that based on the model performance on the last minibatch, not the whole epoch.

I assume that you are also aware of the fact that during the training, the model is changing and metrics computed during the training do not represent the the last performance of the model. To add a running average metrics with ignite is simple, see https://pytorch.org/ignite/v0.2.1/metrics.html#ignite.metrics.RunningAverage With this class you can trace running average of the loss and use the score function based on this metric.

Otherwise, as score function requires only engine, you can store your custom score in engine.state and provide it as score

def score_function(engine):
    return engine.state.my_score

HTH

0reactions
vfdev-5commented, Nov 29, 2019

@oteph hope to make it more clear. Also, do not hesitate to look at other example scripts and notebooks to see how ignite helps to get the code more flexible and refactored.

Feel free to close the issues if they are answered 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

ModelCheckpoint - Keras
ModelCheckpoint callback is used in conjunction with training using model.fit() to save a model or weights (in a checkpoint file) at some interval, ......
Read more >
Use Keras Deep Learning Models with Scikit-Learn in Python
In this post, you will discover how you can use deep learning models from ... I am kind of confused, since the output...
Read more >
tf.keras.callbacks.ModelCheckpoint vs tf.train.Checkpoint
I took a quick look at Keras's implementation of ModelCheckpoint ... I also had a hard time differentiating between the checkpoint objects ...
Read more >
GENIE: Higher-Order Denoising Diffusion Solvers - OpenReview
A crucial drawback of DDMs is that the generative ODE or SDE is typically difficult to solve, due to the complex score function....
Read more >
Classification of crystal structure using a convolutional neural ...
The auto-peak-search was first carried out using proper peak profile ... The whole network still expresses a single differentiable score function: from.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found