question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Relay][RFC] Automatic Differentiation

See original GitHub issue

This RFC aims to pave the road to add automatic differentiation (AD) to Relay, both a first-order and a higher-order algorithm. An AD algorithm calculates the gradient of a program, which is needed for training (back propagation). Additionally, higher-order gradients are sometimes needed for other optimization algorithms such as Newton’s method.

Because Relay support closures and control flow, and there are further plans to add features like algebraic data type (#2175) , our implementation of AD must support back propagation over these constructs. For the implementation, we plan to closely follow “Demystifying Differentiable Programming.” On a high level, our approach generates a graph at runtime and runs reverse-mode automatic differentiation on it as if it were a static graph. This algorithm easily supports closures, control flow, and ADTs and it can be extended to account for further language features.

Differentiating programs that include operators requires us to register implementations of gradients for the different operators, which we will include as attributes. More specfically, for an operator f : <x0, x1, x2...> -> y, the gradient of the operator is f : <x0, x1, x2...> -> <y, y -> <x0, x1, x2...>> It’s signature in C++ is runtime::TypedPackedFunc<Expr()>. The signature for this attibute is open for discussion: we can scale other forms to the higher-order case as well.

First-order AD can be easily optimized and does not require any further language features beyond gradients for operators. AD on Higher-order program in the manner of demyst would require us to add OCaml-style references to Relay, which we plan to submit soon as a PR.

We would appreciate the community’s feedback on this outline for implementing automatic differentiation in Relay. We welcome and would be glad to respond to any comments regarding further details about the implementation of AD and necessary steps to incorporate into Relay. Thanks @slyubomirsky for helping me write this RFC.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:4
  • Comments:19 (18 by maintainers)

github_iconTop GitHub Comments

2reactions
MarisaKirisamecommented, Dec 6, 2018

@masahi it is the fastest path.

It use delimited continuation so it is confusing, but it can be explained without.

Essentially, for every Double expression d, we transform it to an expression of type (Double, Ref Double). It is a tuple which hold two value: the original value, and the gradient of that value.

There is also a global Ref (() -> ()) function (it take no argument, produce an empty tuple) called backward which is resposible for reading the gradient, and writing it upstream.

take expression x + y where x and y are subexpressions for example. we will 0: transform x and y into pair (x, xref) and (y, yref) 1: generate

let zref = new 0;
let old_backward = !backward;
let new_backward = fun() { xref <- !xref + !zref; yref <- !yref + !zref; old_backward() };
backward <- new_backward;
(x + y, zref)

There will be wrapper code which initialize a backward function which does nothing, and convert between Double and (Double, Ref Double)

it is essentially the same for tensor.

1reaction
tqchencommented, Jan 8, 2019

The current AD signature: Array<Expr> (Expr orig_call, Expr out_grad) constructs the gradient AST for a given input.

Imperative AD can make use of this signature along with a high-level taping structure(possibly attached to NDArray of the framework) to get the gradient. So you don’t need another separate signature for imperative AD.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[Relay][RFC] Automatic Differentiation #2237 - apache/tvm
This RFC aims to pave the road to add automatic differentiation (AD) to Relay, both a first-order and a higher-order algorithm.
Read more >
CSC321 Lecture 10: Automatic Differentiation
This lecture: how to build an automatic differentiation (autodiff) library, so that you never have to write derivatives by hand.
Read more >
Automatic differentiation
Automatic differentiation is distinct from symbolic differentiation and numerical differentiation. Symbolic differentiation faces the difficulty of converting a ...
Read more >
What is Automatic Differentiation? - YouTube
This short tutorial covers the basics of automatic differentiation, a set of techniques that allow us to efficiently compute derivatives of ...
Read more >
Dynamic Host Configuration Protocol for IPv6 (DHCPv6)
DHCPv6 also provides a mechanism for automated delegation of IPv6 prefixes using DHCPv6, as originally specified in [RFC3633].
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found