Same distribution, nonzero loss?
See original GitHub issueHi, I observed that my model is failing to converge. I am trying to debug the code and I am observing this peculiar behavior:
torch.sum(-F.softmax(student_out[0][0])*F.log_softmax(student_out[0][0], -1), -1)
returns 6.9058
Shouldn’t this theoretically return 0 since both are the same distribution?
Issue Analytics
- State:
- Created 2 years ago
- Comments:5
Top Results From Across the Web
Loss functions for specific probability distributions?
There is indeed a paper titled Loss Distributions that provides the limited expected value functions L(x) for several probability distributions (on page 15) ......
Read more >Candidate continuous distributions for non-zero loss amounts ...
We propose a new procedure to predict the loss given default (LGD) distribution. Studies find empirical evidence that LGD values have a high...
Read more >loss(y, y) != 0 (same labels and predictions, non-zero loss)
This shows why it is not zero, because you take the log of the prediction and multiply it by the label and take...
Read more >Distribution-based loss functions for deep learning models
Distribution -based loss functions for deep learning models. An overview on cross-entropy and its variants for measuring classification losses, ...
Read more >List of probability distributions - Wikipedia
Contents · 4.1 Two or more random variables on the same sample space · 4.2 Distributions of matrix-valued random variables.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I see what you mean, it is indeed cross entropy loss which is the loss used for DINO pertaining. What I don’t understand is how, in the logs provided by the author, can the cross entropy loss go as low as 2~3 when H(p,q) = H(p) + KL(p,q).
I see. I have found the piece of code for the “DINO loss” (cross-entropy as you mentioned):
https://github.com/facebookresearch/dino/blob/cb711401860da580817918b9167ed73e3eef3dcf/main_dino.py#L380-L390
https://github.com/facebookresearch/dino/blob/cb711401860da580817918b9167ed73e3eef3dcf/main_dino.py#L392-L402