reset_optimizer removes essential parameters in adaptive optimization algorithms
See original GitHub issueš Describe the bug
The reset_optimizer
function that is by default called at the beginning of every training experience at this line by the make_optimizer
by default reinitializes the optimizer with the modelās parameters. This is done for all strategies that inherit base_strategy
,applying it by default to all algorithms.
This does not cause problems if the optimizer is SGD
. But the when the optimizer is Adam, RMSProp etc that crucially track the running mean, std and other statistics of each parameter as part of their algorithm - calling this function deletes all those statistics. Note that such adaptive optimizers are the ones most commonly used, notably also in this examples in avalanche.
š¦ Fix
I can work on fixing this.
My proposal:
Before the model_adaptation
call, add a before_model_adaptation
function that stores the current optimizer as an attribute in the object.
Then, in the reset_optimizer
, determine the new parameters using the state_dict
of the current model.
If there are no new parameters, the optimizer is not modified - this comprises many popular methods including the regularization-based, exemplar based methods, etc that do not expand the model
If the model has added parameters, and as a result, new keys are detected in the state_dict
, then a new param_group
is added to the optimizer. This leaves the previous optimizer and its running stats unchanged.
Finally, the stored previous optimizer is deleted using the del
call.
This works for all current and (future) optimization algorithms.
Let me know if this works, and Iāll create a PR and solve this.
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
Are you sure that this is necessary? Typically, whenever you have a new experience, you also have a domain shift (either new classes or a new domain). Therefore, I donāt expect the optimizerās statistics to be relevant anymore and I think resetting them is correct. However, I never did an experimental comparison.
I think the current behavior is reasonable, and of course it can be changed as you explained above by the user themselves (if necessary). If you have a strong reason to change the default behavior (some experiment or a paper) Iām happy to change it as you propose.
Alternatively, we could add an example that shows how to retain the optimizerās statistic and leave the default as is.
Yes - so that would mean that dynamic modules would still be having the problem, which we can look for a solution later.
In the case of static models, leaving the optimizers unchanged would do