question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Grad of weight decay is wrong when using `update_rule.add_hook`

See original GitHub issue

Thank you for fixing this issue https://github.com/chainer/chainer/issues/7335 But, the grad of wight decay is wrong with this modification when using update_rule.add_hook .

The flow of the update is as follows.

  1. call hook of an optimizer https://github.com/chainer/chainer/blob/5289f671411b7eaf90492df6463ff51dd0724e91/chainer/optimizer.py#L810
  2. divide grad by loss scale https://github.com/chainer/chainer/blob/5289f671411b7eaf90492df6463ff51dd0724e91/chainer/optimizer.py#L206
  3. call hook of an UpdateRule https://github.com/chainer/chainer/blob/5289f671411b7eaf90492df6463ff51dd0724e91/chainer/optimizer.py#L208
  4. update parameters

so, grad of weight decay is loss_scale times larger than the expected value when using update_rule.add_hook

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
toslunarcommented, Sep 26, 2019

Could you replace URLs of the master branch to URLs of the specific commit?

Done.

0reactions
stale[bot]commented, Jan 24, 2020

This issue is closed as announced. Feel free to re-open it if needed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Weight Decay and Its Peculiar Effects - Towards Data Science
This also shows that weight decay will have a negative impact if the model is originally operating in the under-fitting region.
Read more >
Difference between neural net weight decay and learning rate
The learning rate is a parameter that determines how much an updating step influences the current value of the weights. While weight decay...
Read more >
Weight Decay in Machine Learning: Concepts - Data Analytics
Weight decay can be implemented by modifying the update rule for the weights such that the gradient is not only based on the...
Read more >
Understanding and Scheduling Weight Decay | OpenReview
Weight decay is a popular and even necessary regularization technique for training ... Using a too large learning rate may cause bad convergence...
Read more >
Weight decay in the optimizers is a bad idea (especially with ...
Correct me if I'm wrong, but there is no reason the beta and gamma parameters in BatchNorm should ever be subject to weight...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found