loss increasing with larger input range
See original GitHub issueWhen I multiply the input by sqrt(512), loss always increases, on the order of magnitude of 1e2~1e3 from the beginning of training. i.e., Only change fconv.py:
def Embedding(num_embeddings, embedding_dim, padding_idx):
m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
m.weight.data.normal_(0, 0.1)
m.weight.data.mul_(math.sqrt(embedding_dim)) # I add it here
return m
I thought the operation above only change the input’s magnitude from 1e-4 to 1e-2, can you tell me why loss exploding? PS: I’m sure loss decreases normally without mul_
.
Here is part of the log:
| epoch 001: 0%| | 13/14254 [00:19<5:38:48, 1.43s/it, loss=1075.60 (897.77), wps=4021, wpb=5703, bsz=131, lr=0.25, clip=100%, gnorm=1689590071670.1538]
| epoch 001: 1%|3 | 114/14254 [02:44<5:46:16, 1.47s/it, loss=1859.97 (1038.92), wps=3977, wpb=5695, bsz=169, lr=0.25, clip=100%, gnorm=7599585518747.3330]
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Good accuracy despite high loss value - Cross Validated
I have 5 points, and for example input -1 has lead to output 0. ... The cross entropy is rising, the selected a...
Read more >Possible explanations for loss increasing? - Stack Overflow
What are the possible explanations for my loss increasing like this? My initial learning rate is set very low: 1e-6, but I've tried...
Read more >Interpreting Loss Curves | Machine Learning
A large increase in loss is typically caused by anomalous values in input data. Possible causes are: NaNs in input data. Exploding gradient...
Read more >Loss and Loss Functions for Training Deep Learning Neural ...
Cross-entropy loss is minimized, where smaller values represent a better model than larger values. A model that predicts perfect probabilities ...
Read more >Loss increasing instead of decreasing - PyTorch Forums
I have a GRU layer and a fully connected using a single hidden layer. My inputs are variable sized arrays that were padded...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Those values worked well based on cross validation experiments.
Thank you. Is there any explanation on why input should be in a small range (normal(0, 0.1)) instead of normal(0, 1)?