Document upstream gradient behavior for functions with multiple outputs
See original GitHub issueWhen a function has several outputs and the user backwards through one of them, it is not obvious that all of the other upstream gradients are collected. This is probably most true when the last function in the graph has multiple outputs. The following is such a case.
import numpy as np
import chainer
from chainer import Variable
x = chainer.Variable(np.arange(4, dtype='f'))
ys = chainer.functions.split_axis(x, 2, axis=0)
for y in ys:
y.grad_var = chainer.Variable(np.full_like(y, 3, dtype='f'))
ys[0].backward()
x.grad # [3, 3, 3, 3]. Some users might expect [3, 3, 0, 0] since ys[1] is not involved?
How about document this behavior, or is it already?
Issue Analytics
- State:
- Created 5 years ago
- Comments:9 (4 by maintainers)
Top Results From Across the Web
slides - with Deep Learning CS224N/Ling284
From one-layer to mul' layer neural networks! • Fully vectorized gradient computa'on. • The backpropaga'on algorithm. • (Time permi\ng) Class project 'ps.
Read more >Why the sigmoid activation function results in sub-optimal ...
Two primary reasons sigmoid is a sub-optimal activation function for gradient descent: A node's activation saturates at either tail of 0 or ...
Read more >tf.custom_gradient | TensorFlow v2.11.0
The variable upstream is defined as the upstream gradient. i.e. the gradient from all the layers or functions originating from this layer. The ......
Read more >Seemingly random shape error during gradient calculation #325
I wasn't able to find the related stack output or input shapes, so I can't tell if the shape error is caused by...
Read more >Train With Mixed Precision - NVIDIA Documentation Center
Adding loss scaling to preserve small gradient values. ... While many networks match FP32 training results when all tensors are stored in ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This feature has been used through a trick when one wants to start backprop from multiple root variables with
Variable.backward
. When one wants to start backprop from variablesx
andy
, one can feed them toF.identity
and then callbackward
on one of the outputs.I know that this snippet is too tricky, and it’s not good to let users rely on such a trick, but we at least should provide an alternative way to accomplish the same goal if we remove the feature discussed in this issue. One idea is to provide a functional version of
Variable.backward
, saychainer.backward(ys)
, which accepts multiple Variables to start with.This issue is closed as announced. Feel free to re-open it if needed.