[RFC][WIP] Tensor Expression level automatic differentiation
See original GitHub issueI’m working on automatic differentiation at the level of compute expressions, and I would like to share some progress and hear any comments. Currently the automatic differentiation works well enough for some operations, so that it is possible to train a simple model, here is a tutorial on how to do this. Yet, for many operations the performance is unacceptable, but I’m working on it.
My implementation mostly follows this paper. In this notebook I describe how my implementation works internally and give a list of operations which are known to work or not to work. Basically, the AD consists of two parts:
- The automatic differentiation itself which simply differentiates expressions according to the well-known rules and produces inefficient expressions. The code is here.
- A set of transformations to optimize the resulting inefficient expressions. The code is here.
All transformations work on the level of compute expressions (before scheduling). Their general goal is to eliminate summation over zeros by moving up conditional expressions of the form cond ? val : 0
and then using them to simplify iteration domains of reductions. Hopefully, these transformations may be useful for some other tasks besides AD when they are powerful enough. Currently the main problem is that they don’t understand modular arithmetic (which is needed for differentiating dilated and strided convolutions and for the flattening operation).
The git branch The squashed commit The tutorial on training a simple model The notebook describing some internals
Issue Analytics
- State:
- Created 5 years ago
- Reactions:8
- Comments:25 (25 by maintainers)
Top GitHub Comments
Hello everyone. I want to tell you about the current status of tensor expression automatic differentiation. The latest version can be found here. The main improvements are as follows:
Domain
which represents an iteration domain (a set of integer tuples, usually convex), and most of the functions transform domains into other domains (returning objects of the classDomainTransformation
representing two domains and variable mappings).However there are several problems which are TVM-related and should be addressed before creating pull-requests:
I’ve updated our automatic differentiation branch. Now the result of differentiating flatten is acceptable, and operations like max pool work better as well. We have also improved the API, the simple use-case look pretty much the same up to function renaming:
(The function
differentiate
is defined here, and here is a small tutorial). However, now it is possible to get individual adjoints from the result:(It may be useful for manually scheduling intermediate tensors). And it is also possible to override Jacobian computation for some tensors:
(Which may be useful when autodiff does poor job).