Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Grad of weight decay is wrong when using `update_rule.add_hook`

See original GitHub issue

Thank you for fixing this issue https://github.com/chainer/chainer/issues/7335 But, the grad of wight decay is wrong with this modification when using update_rule.add_hook .

The flow of the update is as follows.

call hook of an optimizer https://github.com/chainer/chainer/blob/5289f671411b7eaf90492df6463ff51dd0724e91/chainer/optimizer.py#L810
divide grad by loss scale https://github.com/chainer/chainer/blob/5289f671411b7eaf90492df6463ff51dd0724e91/chainer/optimizer.py#L206
call hook of an UpdateRule https://github.com/chainer/chainer/blob/5289f671411b7eaf90492df6463ff51dd0724e91/chainer/optimizer.py#L208
update parameters

so, grad of weight decay is loss_scale times larger than the expected value when using update_rule.add_hook

Issue Analytics

State:
Created 4 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

toslunarcommented, Sep 26, 2019

Could you replace URLs of the master branch to URLs of the specific commit?

Done.

0reactions

stale[bot]commented, Jan 24, 2020

This issue is closed as announced. Feel free to re-open it if needed.

Read more comments on GitHub >

Top Results From Across the Web

Weight Decay and Its Peculiar Effects - Towards Data Science

This also shows that weight decay will have a negative impact if the model is originally operating in the under-fitting region.

Difference between neural net weight decay and learning rate

The learning rate is a parameter that determines how much an updating step influences the current value of the weights. While weight decay...

Weight Decay in Machine Learning: Concepts - Data Analytics

Weight decay can be implemented by modifying the update rule for the weights such that the gradient is not only based on the...

Understanding and Scheduling Weight Decay | OpenReview

Weight decay is a popular and even necessary regularization technique for training ... Using a too large learning rate may cause bad convergence...

Weight decay in the optimizers is a bad idea (especially with ...

Correct me if I'm wrong, but there is no reason the beta and gamma parameters in BatchNorm should ever be subject to weight...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

ChainerX routines never raise Python primitive exception

Buffer Protocol Support in ChainerX ndarray