question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why does `Copy` compute gradients in reversed order

See original GitHub issue

Niiiice work! I am confused that why Copy compute is in reversed order. https://github.com/kakaobrain/torchgpipe/blob/fca5d65fb68edddf1b056443d014c3aa7f416431/torchgpipe/copy.py#L59-L71

If I write the part, maybe it is implemented directly.

 grad_input:List[Tensor] = []
 input_stream = current_stream(get_device(prev_stream)) 
  
 with use_stream(prev_stream), use_stream(next_stream): 
     for x in grad_output: 
         y = x.to(get_device(prev_stream)) 
         grad_input.append(y) 
  
         # 'next_stream' is not where 'x' has been allocated. 
         record_stream(x, next_stream) 
         # 'y' has been allocated on 'prev_stream'. 
         # It might be used on the current stream captured as 'input_stream'. 
         record_stream(y, input_stream) 

Is there something I do not take into account? Could you explain it? Thank you very much.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
subleecommented, Jul 31, 2020

Sorry to confuse you.

I’ve usually inspected a CUDA timeline to improve computational performance. The readability of CUDA timeline is important to my use case. When every kernel is ordered consistently, I can easily assume that the forward and backward timeline is symmetric with each other. There is no other big advantage of consistent ordering but also, I think, there is no reason to order kernels inconsistently.

0reactions
MlWoocommented, Aug 3, 2020

@sublee Thanks a lot. I will close the issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Introduction to gradients and automatic differentiation
In this guide, you will explore ways to compute gradients with TensorFlow, ... traverses this list of operations in reverse order to compute...
Read more >
connection between loss.backward() and optimizer.step() - ...
The gradients are "stored" by the tensors themselves (they have a grad and a requires_grad attributes) once you call backward() on the loss....
Read more >
Reverse-mode automatic differentiation: a tutorial
In this post, I'll walk through the mathematical formalism of reverse-mode automatic differentiation (AD) and try to explain some simple ...
Read more >
Lecture 6: Backpropagation - YouTube
Lecture 6 discusses the backpropagation algorithm for efficiently computing gradients of complex functions.
Read more >
Quantum gradients with backpropagation
During the forward pass, the results of all intermediate subexpressions are stored ; the computation is then traversed in reverse, with the ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found