question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Very small gradients causing no weight update from your model

See original GitHub issue

Thanks for your code. It helps me understand the BiDAF in details.

However, I found the model had no performance increasing. Every epoch, the metric is always the same. And then, I found it’s the optimized gradients too small. it’s the order of 10^-3~10^-8.

I can’t find what’s wrong. And I think your code is good to understand. So, what may be the problem?

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:31 (19 by maintainers)

github_iconTop GitHub Comments

1reaction
oneTakencommented, Dec 8, 2017

Emm, I am trying to wrap the code into tensorboard. So I can compare with the keras training log, to have a more clear knowledge. By the way, I have a deadline recently. So I can’t spend all my time to solve this. But If I have some improvement, I will tell you.

On Thu, Dec 7, 2017 at 3:02 AM, Junki Ohmura notifications@github.com wrote:

So far, there are not improvements even I modified following items…

  • RNN -> LSTM
  • original loss function
  • use BiDAF’s script to build dataset

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jojonki/BiDAF/issues/1#issuecomment-349740879, or mute the thread https://github.com/notifications/unsubscribe-auth/ARNbsCNdzmMJQPpB2oxXYegMrYmozT0fks5s9uTPgaJpZM4Q0IWy .

0reactions
aneesh-joshicommented, Aug 4, 2018

Thanks @jojonki I wanted to use a modification of BiDAF for Transfer Learning from Span to QA. I have implemented a version of it but I cannot get it to work as advertised.

Thanks for your work.

Read more comments on GitHub >

github_iconTop Results From Across the Web

The Vanishing Gradient Problem. The Problem, Its Causes ...
A small gradient means that the weights and biases of the initial layers will not be updated effectively with each training session.
Read more >
Vanishing and Exploding Gradients in Neural Network ...
As aforementioned, one primary cause of gradients exploding lies in too large of a weight initialization and update, and this is the reason...
Read more >
A Gentle Introduction to Exploding Gradients in Neural ...
Exploding gradients are a problem where large error gradients accumulate and result in very large updates to neural network model weights ...
Read more >
Vanishing and Exploding Gradients in Deep Neural Networks
This, in turn, causes very large weight updates and causes the gradient descent to diverge. This is known as the exploding gradients problem...
Read more >
Debugging Neural Networks with PyTorch and W&B Using ...
If your model is not overfitting, it might be because might be your ... vanishing gradients, the weight updates are very small while...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found