Why does `Copy` compute gradients in reversed order
See original GitHub issueNiiiice work! I am confused that why Copy
compute is in reversed order.
https://github.com/kakaobrain/torchgpipe/blob/fca5d65fb68edddf1b056443d014c3aa7f416431/torchgpipe/copy.py#L59-L71
If I write the part, maybe it is implemented directly.
grad_input:List[Tensor] = []
input_stream = current_stream(get_device(prev_stream))
with use_stream(prev_stream), use_stream(next_stream):
for x in grad_output:
y = x.to(get_device(prev_stream))
grad_input.append(y)
# 'next_stream' is not where 'x' has been allocated.
record_stream(x, next_stream)
# 'y' has been allocated on 'prev_stream'.
# It might be used on the current stream captured as 'input_stream'.
record_stream(y, input_stream)
Is there something I do not take into account? Could you explain it? Thank you very much.
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Introduction to gradients and automatic differentiation
In this guide, you will explore ways to compute gradients with TensorFlow, ... traverses this list of operations in reverse order to compute...
Read more >connection between loss.backward() and optimizer.step() - ...
The gradients are "stored" by the tensors themselves (they have a grad and a requires_grad attributes) once you call backward() on the loss....
Read more >Reverse-mode automatic differentiation: a tutorial
In this post, I'll walk through the mathematical formalism of reverse-mode automatic differentiation (AD) and try to explain some simple ...
Read more >Lecture 6: Backpropagation - YouTube
Lecture 6 discusses the backpropagation algorithm for efficiently computing gradients of complex functions.
Read more >Quantum gradients with backpropagation
During the forward pass, the results of all intermediate subexpressions are stored ; the computation is then traversed in reverse, with the ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Sorry to confuse you.
I’ve usually inspected a CUDA timeline to improve computational performance. The readability of CUDA timeline is important to my use case. When every kernel is ordered consistently, I can easily assume that the forward and backward timeline is symmetric with each other. There is no other big advantage of consistent ordering but also, I think, there is no reason to order kernels inconsistently.
@sublee Thanks a lot. I will close the issue.