question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't use chainer.grad for LSTM

See original GitHub issue

I tried to differentiate a LSTM block with chainer.grad function along with enable_double_backprop=True option. Then got the error:

Code:

import numpy as np
import chainer
import chainer.links as L

x = chainer.Variable(np.random.rand(10, 20).astype('f'))
lstm = L.LSTM(20, 20)
y = lstm(x)
dydx, = chainer.grad([y], [x], enable_double_backprop=True)

Error:

~/.pyenv/versions/anaconda3-5.0.0/lib/python3.6/site-packages/chainer/functions/activation/lstm.py in backward(self, indexes, grads)
    111     def backward(self, indexes, grads):
    112         grad_inputs = (
--> 113             self.get_retained_inputs() + self.get_retained_outputs() + grads)
    114         return LSTMGrad()(*grad_inputs)

Is this a bug of LSTM module or my wrong usage of LSTM? I’m using Chainer v4.0.0b3.

Any comments will help. Thanks.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
toslunarcommented, Feb 12, 2018
dydx, = chainer.grad(
    [y, lstm.c, lstm.h], [x],
    [np.ones_like(y), np.zeros_like(lstm.c), np.zeros_like(lstm.h)],
    enable_double_backprop=True)  # Now success with #4320

loss = 10. * F.mean_squared_error(dydx, np.ones_like(dydx.array))

lstm.cleargrads()
loss.backward()

seems to work if the lines

        if ggc_prev is None:
            ggc_prev = xp.zeros_like(c)

are added to LSTMGrad.backward.

0reactions
toslunarcommented, Feb 12, 2018

Sorry,

    [np.ones_like(y), np.zeros_like(lstm.c), np.zeros_like(lstm.h)],

should be

    [np.ones_like(y.data), np.zeros_like(lstm.c.data), np.zeros_like(lstm.h.data)],

It seems that leaving lstm.h.grad to be None is fine:

dydx, = chainer.grad(
    [y, lstm.c], [x],
    [np.ones_like(y.data), np.zeros_like(lstm.c.data)],
    enable_double_backprop=True)  # Now success with #4320
Read more comments on GitHub >

github_iconTop Results From Across the Web

Chainer Documentation
Chainer is a powerful, flexible and intuitive deep learning framework. • Chainer supports CUDA computation. It only requires a few lines of code...
Read more >
Gradient of the layers of a loaded neural network in Chainer
If you want to get .grad of the input image, you have to wrap the input by ... Use chainer.grad() to obtain .grad...
Read more >
Why do we need both cell state and hidden value in LSTM ...
Regarding question (2), vanishing/exploding gradients happen in LSTMs too. In vanilla RNNs, the gradient is a term that depends on a factor ...
Read more >
Understanding Gradient Clipping (and How It Can Fix ...
So if we take the derivative with respect to W we can't simply treat a<3> as constant. We need to apply the chain...
Read more >
Long Short-Term Memory: From Zero to Hero with PyTorch
RNNs are unable to remember information from much earlier. However, due to the short-term memory, the typical RNN will only be able to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found