question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

reset_optimizer removes essential parameters in adaptive optimization algorithms

See original GitHub issue

šŸ› Describe the bug The reset_optimizer function that is by default called at the beginning of every training experience at this line by the make_optimizer by default reinitializes the optimizer with the modelā€™s parameters. This is done for all strategies that inherit base_strategy,applying it by default to all algorithms.

This does not cause problems if the optimizer is SGD. But the when the optimizer is Adam, RMSProp etc that crucially track the running mean, std and other statistics of each parameter as part of their algorithm - calling this function deletes all those statistics. Note that such adaptive optimizers are the ones most commonly used, notably also in this examples in avalanche.

šŸ¦‹ Fix

I can work on fixing this.

My proposal:

Before the model_adaptation call, add a before_model_adaptation function that stores the current optimizer as an attribute in the object.

Then, in the reset_optimizer, determine the new parameters using the state_dict of the current model.

If there are no new parameters, the optimizer is not modified - this comprises many popular methods including the regularization-based, exemplar based methods, etc that do not expand the model

If the model has added parameters, and as a result, new keys are detected in the state_dict, then a new param_group is added to the optimizer. This leaves the previous optimizer and its running stats unchanged.

Finally, the stored previous optimizer is deleted using the del call.

This works for all current and (future) optimization algorithms.


Let me know if this works, and Iā€™ll create a PR and solve this.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
AntonioCartacommented, Aug 25, 2021

Are you sure that this is necessary? Typically, whenever you have a new experience, you also have a domain shift (either new classes or a new domain). Therefore, I donā€™t expect the optimizerā€™s statistics to be relevant anymore and I think resetting them is correct. However, I never did an experimental comparison.

I think the current behavior is reasonable, and of course it can be changed as you explained above by the user themselves (if necessary). If you have a strong reason to change the default behavior (some experiment or a paper) Iā€™m happy to change it as you propose.

Alternatively, we could add an example that shows how to retain the optimizerā€™s statistic and leave the default as is.

0reactions
ashok-arjuncommented, Nov 21, 2021

Yes - so that would mean that dynamic modules would still be having the problem, which we can look for a solution later.

In the case of static models, leaving the optimizers unchanged would do

Read more comments on GitHub >

github_iconTop Results From Across the Web

Various Optimization Algorithms For Training Neural Network
Optimization algorithms or strategies are responsible for reducing the losses and to provide the most accurate results possible. We'll learn about differentĀ ...
Read more >
The Implicit Regularization for Adaptive Optimization ... - arXiv
In this paper, we study the implicit regularization of adaptive optimization algorithms when they are optimizing the logistic loss on homogeneous deep neuralĀ ......
Read more >
The Implicit Bias for Adaptive Optimization Algorithms on ...
Except the first-order optimization algorithms like GD, adaptive algorithms ... In particular, we study the convergent direction of parameters when they areĀ ...
Read more >
On the Algorithmic Stability and Generalization of Adaptive ...
Recent years have witnessed a surge of interest in adaptive optimization methods for deep learning settings. For instance, Adam [1] ā€” despite its...
Read more >
Understanding Adaptive Optimization techniques in Deep ...
In Adadelta optimization technique it removes the learning rate parameter and replaces it with delta. Compiling the CNN model with AdadeltaĀ ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found