Inverse Accumulation Mode
See original GitHub issueInverted Jacobian products are useful in a variety of algorithms such as the efficient implementation of Newton’s method with regularization. However, Jax currently only provides non-inverted Jacobian products (jvp
and vjp
).
It appears that it is possible to efficiently implement inverted Jacobian products in an automatic differentiation library like Jax thanks to a recent paper:
Siskind, Jeffrey Mark. “Automatic Differentiation: Inverse Accumulation Mode.” (2019).
The interface for the inverted Jacobians could be something like:
ijvp
that accepts a function and primals and produces a functionf_ijvp
that accepts input cotangents and produces output cotangents, andivjp
that accepts a function and primals and produces a functionf_ivjp
that accepts output tangents and produces input tangents.
Using these, we could also produce inverse_jacfwd
and inverse_jacrev
, one of which could be mapped to inverse_jacobian
.
Has anyone on the Jax team looked into this?
Issue Analytics
- State:
- Created a year ago
- Reactions:5
- Comments:8 (6 by maintainers)
Top Results From Across the Web
Automatic Differentiation: Inverse Accumulation Mode
Abstract: We show that, under certain circumstances, it is possible to automatically compute Jacobian-inverse-vector and ...
Read more >3: (a) Accumulation, (b) depletion, and (c) inversion modes in ...
This work comprises the study of oxide semiconductors (Sb-doped SnO2 and TiO2) and insulating materials (ZrO2) obtained by sol-gel, and the investigation of ......
Read more >MOS Capacitor - Accumulation Mode Explained - YouTube
https://www.patreon.com/edmundsjIf you want to see more of these videos, or would like to say thanks for this one, the best way you can...
Read more >MOS Transistor - Modes of Operation | Know - How - YouTube
The structure of simple nMOS is shown and voltage conditions are varied to depict the accumulation, depletion and inversion modes.
Read more >MOS Capacitor - Inversion Mode Explained - YouTube
https://www.patreon.com/edmundsjIf you want to see more of these videos, or would like to say thanks for this one, the best way you can...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks to the inverse function theorem, another way to compute the same thing is to compose
jvp
,vjp
,jacfwd
, orjacrev
with theoryx.core.inverse
transformation. That is, forf : R^n \to R^n
, we havex \mapsto ∂ f^{-1}(x) == x \mapsto inv(∂f(x))
pointwise, whereinv
is meant to denote dense matrix inverse. Actually I’d be interested if that ends up generating the same computation as discussed in that paper (which I admit not to have read yet, beyond the abstract!).I think
jacfwd
-of-oryx.core.inverse
-of-f
would in general be a fairly different computation thanjnp.linalg.inverse
-of-jacfwd
-of-f
because the former would exploit sparsity structure represented in the program and its dataflow, whereas the latter would just be operating on a dense matrix.Indeed the advantage of our approach is that it is compositional and thus has running time proportional to the primal. But like ordinary reverse mode, it requires a tape whose size is proportional to the the running time.
There is no need to do Gaussian elimination. The only inversion is that of the scalar a in (5a right) because steps (4) involve only unary and binary operations.
The catch is that each step must be what we call “equasive”; it must have the same number of inputs and outputs. (We call a step that has more outputs than inputs “expansive” and a step that has fewer outputs than inputs “contractive”.) Equasive steps have square Jacobians; Expansive and Contractive steps do not. Non-square Jacobians are not invertible.
It is possible that the overall computation is equasive but the individual steps are not. If the dimension of the intermediate state is ever smaller than the input/output dimension, then the Jacobian is not invertible. But it is possible that as the computation progresses, the dimension increases and then decreases, perhaps more than once, never going below the input/output dimension. It is further possible that the dimension varies over the course of the computation (alway being above the input/output dimension) but at one or more intermediate point returns to the input/output dimension. If this is the case, it is possible to split the computation at those points into a sequence of equasive chunks we call “lumps”. It is then possible to apply the method to that sequence of lumps. When doing this, the lumps would no longer be steps consisting of unary and binary operations. Thus instead of (5a) you have (5b) where in (5b right) you have to invert A. The saving grace is that the dimension of A is likely to be (much) smaller than the input/output dimension of the whole problem.
We implemented the stepwise equasive variant in a variant of R6RS-AD.
https://github.com/qobi/R6RS-AD
(Note that JAX is based on HIPS Autograd which is based on R6RS-AD. R6RS-AD predates HIPS Autograd by about 7 years and predates JAX by about a decade. R6RS-AD was used in
@inproceedings{nips2011, author = {D. Wingate and N. Goodman and A. Stuhlm{"{u}}ller and J. M. Siskind}, title = {Nonstandard Interpretations of Probabilistic Programs for Efficient Inference}, booktitle = nips, location = {Granada, Spain}, day = {12–15}, month = dec, year = 2011, url = {http://engineering.purdue.edu/~qobi/papers/nips2011.pdf}} )
The implementation is straightforward and enclosed. We worked on methods to automatically divide an arbitrary computation graph into lumps (what we call “lumpification”). But that work is not complete. It is complicated because the dimension of the intermediate state depends on how you schedule the operations. Thus the possible lumpifications depend on scheduling. It appears to be NP hard to optimally schedule to minimize the dimension of the A of the maximal lump.
The link to stackexchange discusses the Moore-Penrose inverse of non-square Jacobians. We spent some time investigating this, as well as a variety of other pseudoinverses besides the Moore-Penrose pseudoinverse. We are unaware of any pseudoinverse that is compositional. Compositonality is required to make (3) work. There might be one that is compositional (and useful) that we are unaware of. It also might be the case that a product of Moore-Penrose (or other) pseudoinverse, while not preserving the properties of that pseudoinverse, might still be useful. We never got very far along this line of investigation.