ModelCheckpoint's _saved variable and EarlyStopping
See original GitHub issueI’m using ignite 0.2.1 similar to the transfer-learning-conv-ai repo by Hugging Face. In these lines, you can see that:
- the checkpoint is being saved for every epoch
- just the last three saved checkpoints are being retained on disk
- the last checkpoint (due to
_saved[-1]
) is being renamed to be the final trained model
In my code, I’m additionally using the EarlyStopping
class with a configurable patience like this:
valid_es_handler = EarlyStopping(patience=args.patience, score_function=early_stopping_score_function,
trainer=trainer)
validator.add_event_handler(Events.COMPLETED, valid_es_handler)
Now what I want to accomplish is this: I want to identify and rename the best (in terms of validation set score) trained model from the window of stored checkpoints.
I think the first change that needs to be done is n_saved=args.patience
from n_saved=3
, so that the window of saved checkpoints is equal to the patience used for early stopping.
Consequently, it looks like I need to provide the same early_stopping_score_function
also to ModelCheckpoint
using the score_function
arg, and that would create a score-based priority queue of saved checkpoints.
And with those changes, it looks like _saved[-1]
would still point to the “best” model checkpoint in the window. Is my understanding of the changes correct?
Also, I haven’t looked at the newer versions of ignite after 0.2.1, but could you please share what the breaking changes are (using the above linked code as an example)? I might consider upgrading to the latest ignite if the changes needed are minimal.
The other thing I don’t understand is this - the score function would be called on the engine
, but for our use-case, this engine
should be the validator
(for both EarlyStopping
and ModelCheckpoint
), right?
But this line in the transfer-learning-conv-ai repo:
trainer.add_event_handler(Events.EPOCH_COMPLETED, checkpoint_handler, {'mymodel': getattr(model, 'module', model)}) # "getattr" take care of distributed encapsulation
will end up making the score function call on the trainer
Engine if I understand correctly. How do I ensure that the validator
is used for the score function in the checkpoint_handler
?
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (1 by maintainers)
Top GitHub Comments
@g-karthik please tell us if @sdesrozis 's solution does not fit.
There were a bug with that found recently : https://github.com/pytorch/ignite/pull/745 It was then fixed and code is available in nightly release.
Please, the release notes of 0.3.0 and keep us updated if you have other questions 😃
Thank you for this report +1
I don’t have
ignite 0.2.1
in mind but for checkpoint, please look the following codeThis snippet is from https://github.com/pytorch/ignite/blob/master/ignite/contrib/engines/common.py to help to define handlers.
So it’s possible to save wrt to a metric 😃 and the score is suffixed in the name of the checkpoint file.
I hope it could help !
EDIT : Ok you pointed out internal ignite code so I suppose you already see that :’
EDIT 2 : for the second part of your question, I think that checkpoint should be attached to evaluator (like the snippet I shared). Althought, I don’t know if it’s ok with
ignite 0.2.1
…REMARK
Maybe we could refactor the code from HuggingFace to update to a recent version ofTheignite
?requirements.txt
refers topytorch-ignite
so I guess0.3
(see https://github.com/huggingface/transfer-learning-conv-ai/blob/master/requirements.txt)@vfdev-5 you should have more inputs.