Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

loss increasing with larger input range

See original GitHub issue

When I multiply the input by sqrt(512), loss always increases, on the order of magnitude of 1e2~1e3 from the beginning of training. i.e., Only change fconv.py:

def Embedding(num_embeddings, embedding_dim, padding_idx):
    m = nn.Embedding(num_embeddings, embedding_dim, padding_idx=padding_idx)
    m.weight.data.normal_(0, 0.1)
    m.weight.data.mul_(math.sqrt(embedding_dim))  # I add it here
    return m

I thought the operation above only change the input’s magnitude from 1e-4 to 1e-2, can you tell me why loss exploding? PS: I’m sure loss decreases normally without mul_.

Here is part of the log:

| epoch 001:   0%|                                         | 13/14254 [00:19<5:38:48,  1.43s/it, loss=1075.60 (897.77), wps=4021, wpb=5703, bsz=131, lr=0.25, clip=100%, gnorm=1689590071670.1538]
| epoch 001:   1%|3                                      | 114/14254 [02:44<5:46:16,  1.47s/it, loss=1859.97 (1038.92), wps=3977, wpb=5695, bsz=169, lr=0.25, clip=100%, gnorm=7599585518747.3330]

Issue Analytics

State:
Created 6 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

michaelaulicommented, Nov 5, 2017

Those values worked well based on cross validation experiments.

0reactions

Zrachelcommented, Nov 1, 2017

Thank you. Is there any explanation on why input should be in a small range (normal(0, 0.1)) instead of normal(0, 1)?

Read more comments on GitHub >

Top Results From Across the Web

Good accuracy despite high loss value - Cross Validated

I have 5 points, and for example input -1 has lead to output 0. ... The cross entropy is rising, the selected a...

Possible explanations for loss increasing? - Stack Overflow

What are the possible explanations for my loss increasing like this? My initial learning rate is set very low: 1e-6, but I've tried...

Interpreting Loss Curves | Machine Learning

A large increase in loss is typically caused by anomalous values in input data. Possible causes are: NaNs in input data. Exploding gradient...

Loss and Loss Functions for Training Deep Learning Neural ...

Cross-entropy loss is minimized, where smaller values represent a better model than larger values. A model that predicts perfect probabilities ...

Loss increasing instead of decreasing - PyTorch Forums

I have a GRU layer and a fully connected using a single hidden layer. My inputs are variable sized arrays that were padded...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

Got a stuck , when running train.py

Exploding in WMT14 en-fr