Implementing an adjoint calculation for backprop-ing through time
See original GitHub issueShould consider the performance benefit of implementing an adjoint calculation for the backward pass through the forward()
method in WaveCell
. This would potentially save us on memory during gradient computation because pytorch doesn’t need to construct as large of a graph.
The approach is described here: https://pytorch.org/docs/stable/notes/extending.html
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (1 by maintainers)
Top Results From Across the Web
Adjoint State Method, Backpropagation and Neural ODEs
Here I present an exposition that is focused on the effective implementation of the backpropagation using matrix calculations and also has ...
Read more >Back Propagation Through Adjoints for the Identification of ...
Abstract-In this paper, back propagation is reinvestigated for an efficient evaluation of the gradient in arbitrary interconnec-.
Read more >Back propagation through adjoints for the identification of ...
In this paper, back propagation is reinvestigated for an efficient evaluation of the gradient in arbitrary interconnections of recurrent subsystems.
Read more >Gradient calculations for dynamic recurrent neural networks
Abstract. Surveys learning algorithms for recurrent neural networks with hidden units and puts the various techniques into a common framework. The authors ...
Read more >Event-based backpropagation can compute exact gradients ...
The parameter-dependent spike discontinuities were treated in a well-defined manner using the adjoint method in combination with partial ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Done, @twhughes
This is now partially implemented. Currently, the individual time step is a primitive. This seems to help with memory utilization during training, especially with nonlinearity. Perhaps we could investigate if there would be significant performance benefits from adjoint-ing the time loop as well.