Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Implementing an adjoint calculation for backprop-ing through time

See original GitHub issue

Should consider the performance benefit of implementing an adjoint calculation for the backward pass through the forward() method in WaveCell. This would potentially save us on memory during gradient computation because pytorch doesn’t need to construct as large of a graph.

The approach is described here: https://pytorch.org/docs/stable/notes/extending.html

Issue Analytics

State:
Created 4 years ago
Comments:8 (1 by maintainers)

Top GitHub Comments

1reaction

parenthetical-ecommented, Aug 27, 2019

Done, @twhughes

0reactions

ianwilliamsoncommented, Sep 20, 2019

This is now partially implemented. Currently, the individual time step is a primitive. This seems to help with memory utilization during training, especially with nonlinearity. Perhaps we could investigate if there would be significant performance benefits from adjoint-ing the time loop as well.

Top Results From Across the Web

Adjoint State Method, Backpropagation and Neural ODEs

Here I present an exposition that is focused on the effective implementation of the backpropagation using matrix calculations and also has ...

Back Propagation Through Adjoints for the Identification of ...

Abstract-In this paper, back propagation is reinvestigated for an efficient evaluation of the gradient in arbitrary interconnec-.

Back propagation through adjoints for the identification of ...

In this paper, back propagation is reinvestigated for an efficient evaluation of the gradient in arbitrary interconnections of recurrent subsystems.

Gradient calculations for dynamic recurrent neural networks

Abstract. Surveys learning algorithms for recurrent neural networks with hidden units and puts the various techniques into a common framework. The authors ...

Event-based backpropagation can compute exact gradients ...

The parameter-dependent spike discontinuities were treated in a well-defined manner using the adjoint method in combination with partial ...