About updating centers
See original GitHub issueHi~ May I ask whether the implementation in this repo will be equivalent to the original implementation in paper?
# by doing so, weight_cent would not impact on the learning of centers
for param in criterion_cent.parameters():
param.grad.data *= (1. / args.weight_cent)
Simply let alpha=alpha/lambda
, (alpha is lr_cent, lambda is weight_cent), will it equivalent to previous implementation?
It seems the author do not adopt gradient w.r.t. c_j, and instead, use the delta rule shown below.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:4
- Comments:5 (1 by maintainers)
Top Results From Across the Web
The Update Center - Sitecore Documentation
The Update Center accesses a package management service for information about updates and for the update packages themselves.
Read more >Remove Updates-center.com Ads (Virus Removal Guide)
This guide teaches you how to remove the Updates-center.com browser hijacker by following easy step-by-step instructions.
Read more >Updating Start Centers - IBM
When a system administrator notifies you that changes have been made to your start center template, you can update the Start Center page....
Read more >Check & update your Android version - Google Support
Open your phone's Settings app. Near the bottom, tap About phone and then Android version. Find your "Android version," "Android security update," and...
Read more >Do I Need to Update My Data Center Network?
Normally, data center networks are updated when new applications or servers are installed in the infrastructure.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Since in a mini-batch some centers may occur more frequently than the others, I guess the author of center loss aims to average the gradient by the number of centers in a mini-batch.
I have not written the backward function to normalize the gradient yet, because by tuning learning-rate and alpha, the code provided by KaiyangZhou achieves reasonable performance.
We may have a try and compare the performance. https://pytorch.org/tutorials/beginner/examples_autograd/two_layer_net_custom_function.html
@luzai yeah,the sum of gradients should be normalized by the number of example belonging to the center in the mini-batch