question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Document upstream gradient behavior for functions with multiple outputs

See original GitHub issue

When a function has several outputs and the user backwards through one of them, it is not obvious that all of the other upstream gradients are collected. This is probably most true when the last function in the graph has multiple outputs. The following is such a case.

import numpy as np
import chainer
from chainer import Variable

x = chainer.Variable(np.arange(4, dtype='f'))
ys = chainer.functions.split_axis(x, 2, axis=0)
for y in ys:
    y.grad_var = chainer.Variable(np.full_like(y, 3, dtype='f'))
ys[0].backward()
x.grad  # [3, 3, 3, 3]. Some users might expect [3, 3, 0, 0] since ys[1] is not involved?

How about document this behavior, or is it already?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
beam2dcommented, Nov 27, 2018

This feature has been used through a trick when one wants to start backprop from multiple root variables with Variable.backward. When one wants to start backprop from variables x and y, one can feed them to F.identity and then call backward on one of the outputs.

x, y = F.identity(x, y)
x.grad = ...
y.grad = ...
x.backward()

I know that this snippet is too tricky, and it’s not good to let users rely on such a trick, but we at least should provide an alternative way to accomplish the same goal if we remove the feature discussed in this issue. One idea is to provide a functional version of Variable.backward, say chainer.backward(ys), which accepts multiple Variables to start with.

0reactions
stale[bot]commented, Oct 30, 2019

This issue is closed as announced. Feel free to re-open it if needed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

slides - with Deep Learning CS224N/Ling284
From one-layer to mul' layer neural networks! • Fully vectorized gradient computa'on. • The backpropaga'on algorithm. • (Time permi\ng) Class project 'ps.
Read more >
Why the sigmoid activation function results in sub-optimal ...
Two primary reasons sigmoid is a sub-optimal activation function for gradient descent: A node's activation saturates at either tail of 0 or ...
Read more >
tf.custom_gradient | TensorFlow v2.11.0
The variable upstream is defined as the upstream gradient. i.e. the gradient from all the layers or functions originating from this layer. The ......
Read more >
Seemingly random shape error during gradient calculation #325
I wasn't able to find the related stack output or input shapes, so I can't tell if the shape error is caused by...
Read more >
Train With Mixed Precision - NVIDIA Documentation Center
Adding loss scaling to preserve small gradient values. ... While many networks match FP32 training results when all tensors are stored in ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found