question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question about second-order gradients for GRU

See original GitHub issue

Hi, I was trying to use the package for obtaining second-order gradients through the optimization process of a model with GRU units each followed by a linear layer. However, when I check torch.autograd.grad(loss, learner_fmodel.parameters(time=0)) I get the error RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.. With allow_unused=True, I see that the gradients with respect to the GRU parameters are None whereas the gradient with respect to the linear layer has values. I was wondering if this is indeed supported for GRU?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:14 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
egrefencommented, Feb 13, 2020

Just to update you, I can also reproduce this error. Eyeballing the current pytorch rnn code, I’m not sure why this is happening, so will need to loop in someone from the pytorch team.

I note that the unit tests for RNNs in higher is incomplete and wouldn’t catch this, so I will fix that first, and then try to get this progressed.

This is a blocking issue for my own research so will try to get this sorted ASAP.

0reactions
berlinocommented, Feb 27, 2020

@egrefen It works for me as well. Thanks for the effort!

Read more comments on GitHub >

github_iconTop Results From Across the Web

How GRU solves vanishing gradient - Cross Validated
To answer your 2nd question: Using GRU, your hope is that you can learn the long term dependency in a given task but,...
Read more >
Online Second Order Methods for Non-Convex Stochastic ...
Abstract—This paper proposes a family of online second order methods for possibly non-convex stochastic optimizations based.
Read more >
Illustrated Guide to LSTM's and GRU's: A step by ...
Gradients are values used to update a neural networks weights. The vanishing gradient problem is when the gradient shrinks as it back propagates ......
Read more >
None gradient from GRU · Issue #5985
The problem is that the returned new gradients are all None s. ... The error I get is: TypeError: Second-order gradient for while...
Read more >
12.3. Gradient Descent
For instance, the optimization problem might diverge due to an overly large ... Second-order methods that look not only at the value and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found