Very small gradients causing no weight update from your model
See original GitHub issueThanks for your code. It helps me understand the BiDAF in details.
However, I found the model had no performance increasing.
Every epoch, the metric is always the same.
And then, I found it’s the optimized gradients too small.
it’s the order of 10^-3~10^-8
.
I can’t find what’s wrong. And I think your code is good to understand. So, what may be the problem?
Issue Analytics
- State:
- Created 6 years ago
- Comments:31 (19 by maintainers)
Top Results From Across the Web
The Vanishing Gradient Problem. The Problem, Its Causes ...
A small gradient means that the weights and biases of the initial layers will not be updated effectively with each training session.
Read more >Vanishing and Exploding Gradients in Neural Network ...
As aforementioned, one primary cause of gradients exploding lies in too large of a weight initialization and update, and this is the reason...
Read more >A Gentle Introduction to Exploding Gradients in Neural ...
Exploding gradients are a problem where large error gradients accumulate and result in very large updates to neural network model weights ...
Read more >Vanishing and Exploding Gradients in Deep Neural Networks
This, in turn, causes very large weight updates and causes the gradient descent to diverge. This is known as the exploding gradients problem...
Read more >Debugging Neural Networks with PyTorch and W&B Using ...
If your model is not overfitting, it might be because might be your ... vanishing gradients, the weight updates are very small while...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Emm, I am trying to wrap the code into tensorboard. So I can compare with the keras training log, to have a more clear knowledge. By the way, I have a deadline recently. So I can’t spend all my time to solve this. But If I have some improvement, I will tell you.
On Thu, Dec 7, 2017 at 3:02 AM, Junki Ohmura notifications@github.com wrote:
Thanks @jojonki I wanted to use a modification of BiDAF for Transfer Learning from Span to QA. I have implemented a version of it but I cannot get it to work as advertised.
Thanks for your work.