ModelCheckpoint score_function confusing to use
See original GitHub issueI use create_supervised_trainer
in combination with ModelCheckpoint
.
checkpoint = ModelCheckpoint(
f"./models/{start_time}",
"resnet_50",
score_name="loss",
score_function=lambda engine: 1 - engine.state.output, # output contains the minibatch output
n_saved=10,
create_dir=True,
)
With supervised trainer in not straight forward to add a metric. So users only have access to the state.output, which only contains the minibatch. This causes models to be saved that based on the model performance on the last minibatch, not the whole epoch.
It might be a good idea to add a warning to the description of Modelcheckpoint that output contains the minibach output only. And/or add an example that shows how to use it in combination with a metric.
Issue Analytics
- State:
- Created 4 years ago
- Comments:5
Top Results From Across the Web
ModelCheckpoint - Keras
ModelCheckpoint callback is used in conjunction with training using model.fit() to save a model or weights (in a checkpoint file) at some interval, ......
Read more >Use Keras Deep Learning Models with Scikit-Learn in Python
In this post, you will discover how you can use deep learning models from ... I am kind of confused, since the output...
Read more >tf.keras.callbacks.ModelCheckpoint vs tf.train.Checkpoint
I took a quick look at Keras's implementation of ModelCheckpoint ... I also had a hard time differentiating between the checkpoint objects ...
Read more >GENIE: Higher-Order Denoising Diffusion Solvers - OpenReview
A crucial drawback of DDMs is that the generative ODE or SDE is typically difficult to solve, due to the complex score function....
Read more >Classification of crystal structure using a convolutional neural ...
The auto-peak-search was first carried out using proper peak profile ... The whole network still expresses a single differentiable score function: from.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@oteph thanks for the feedback! Could you please explain what would you like to achieve using
ModelCheckpoint
on the trainer ?In general,
ModelCheckpoint
is used to create training checkpoints and save best models according to validation scores.above code is working with
master
. In stable version you can usesave_interval
I assume that you are also aware of the fact that during the training, the model is changing and metrics computed during the training do not represent the the last performance of the model. To add a running average metrics with ignite is simple, see https://pytorch.org/ignite/v0.2.1/metrics.html#ignite.metrics.RunningAverage With this class you can trace running average of the loss and use the score function based on this metric.
Otherwise, as score function requires only engine, you can store your custom score in
engine.state
and provide it as scoreHTH
@oteph hope to make it more clear. Also, do not hesitate to look at other example scripts and notebooks to see how ignite helps to get the code more flexible and refactored.
Feel free to close the issues if they are answered 😃