[RFC] NNVMv2 IR - Relay
See original GitHub issue[RFC]: Relay a new high level IR for TVM
Relay is a new high level intermediate representation (IR) intended to act as v2.0 of NNVM.
Motivation
Computation graphs are a powerful program representation as demonstrated by the first generation of DL frameworks. Most popular frameworks have employed computation graphs as their input, intermediate representation, and execution data structure.
However, as workloads continue to evolve, the design of our high level IRs needs to evolve to better support the needs of developers and users
Graph-level challenges such as control flow and sub-graphs have become necessary features to natively support and optimize.
The tight coupling between runtime representation and compile-time representation has limited flexibility and frustrated developers; Relay will decouple the representations.
Finally we believe the high level must be designed in tandem with the low level IR, allowing for the two layers to communicate during compilation to achieve optimal performance.
Design
The first version of NNVM set out to solve some of these challenges, and we view Relay as second generation IR designed specifically for integration into the TVM stack as the input layer. Our goal is to focus on TVM as our primary backend, easing development and maintenance for both TVM developers and current NNVM users, as well as enabling new features.
In order to address the challenges presented above we designed Relay to build on the things computation graphs are good at (pure, dataflow, compositional), and improve on the things they struggle with (control flow, subgraph, runtime/compilation distinction).
Core IR
Relay is a typed pure functional IR, with a few basic features such as functions, if-then-else control flow, recursion, operator and function calls, and variable binding.
We have iterated on Relay’s design over the past 8 months. This versions represents the culmination of our experiments. This PR does not contain all the pieces of the previous version, instead we focus on introducing the core IR, its associated data structures, and a few integral passes.
The core IR is defined in just a few files:
include/tvm/relay/base.h
(the base classes and common data)include/tvm/relay/type.h
(the type system and all relevant nodes)include/tvm/relay/expr.h
(the expression language)
Typing
All Relay programs are typed, similar to more conventional languages such as C++. A type system allows us to statically (i.e at compile time) distinguish between different sorts of values. This means we know whether an expression will evaluate to a tensor, a function (i.e (float32, float32) -> float32) or a tuple (float32, int32). Furthermore, our type system has the ability to be shape generic (i.e polymorphism, templating).
Type inference and checking take the place of shape inference in traditional computation graphs style IRs.
This PR implements type inference and checking for Relay, the code can be found in src/tvm/relay/pass/type_infer.cc
, and relevant helper utilities in src/tvm/relay/pass
.
Control Flow
Relay adds a notion of control flow to the IR, in the form of simple if (cond) { true_branch } else { false_branch}
. Relay requires that the condition variable computes a single boolean
value controlling which branch is taken. if
is an expression in Relay, meaning the result of the entire
expression is the result of the branch taken.
We introduce this to add a formal way to distinguish between data flow and control flow without having to conflate the two in the representation. Because we separate the control signal, we can easily batch a program without affecting control flow.
The definition of control flow can be found in include/tvm/relay/expr.h
.
Abstraction
Relay supports the definition of functions which can be used to represent “sub-graphs” (i.e chunks of reusable computation).
Relay functions are like traditional functions: they represent some set of parameters (i.e placeholders) and a body which is a chunk of computation involving the the parameters (i.e sub-graph). We can build a full network/model by composing together functions.
Compilation
The Relay IR is designed as a compile time representation of models. The new features are exposed only in Relay’s abstract syntax tree, and used for compile time program manipulation. We do not intend to use Relay’s IR as a data structure for serious interpretation or execution.
Runtime
These new features increase the expressivity of the current computation model, and one may ask how to execute programs using these features with the existing runtime. Our goal is to introduce Relay as the compiler representation in this PR, and reuse the existing runtime maintaining compatibility on both the frontend and backend. We anticipate a new version of the runtime having native support for Relay’s new constructs in the future.
TVM Co-design
We made an effort to model Relay’s implementation after TVM and reuse much of the existing infrastructure in order to provide better compatibility between TOPI operators and Relay programs. One big design decision is reusing the TVM node system to expose the Relay language to Python in the style of TVM. Users who are familiar with TVM’s expression language should feel comfortable working with the Relay AST’s definition in C++, and Python. We also share representations for many data structures. For example tensor containers (i.e tvm::runtime::NDArray
), and generic attributes which can be shared between Relay and TVM are two such shared structures.
Transitioning from NNVM
We plan on adding a guide for transitioning programs from NNVM to Relay. This is one of the remaining work items before releasing the Relay Alpha. The goal is users can use the Relay operators and builder API to construct Relay programs, and we will follow-up with a compatibility layer to make transitioning from NNVM smooth.
For an implementation see #1672 which implements this bit.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:18
- Comments:75 (72 by maintainers)
Top GitHub Comments
Thanks @jroesch for the proposal, I am going to elaborate some of my take on this proposal. Note that no design is perfect and that is why we need help of everyone to work together the evolve the IR.
Specific Technical Points
Some Possible Point to Discuss
These are things that pops up from my head, feel free to add more.
@junrushao1994 I am the main designer/implementer of relay’s automatic differentiation system.
In general, doing non-tracing based reverse mode automatic differentiation on arbitrary lambda is extremely hard. There is only one paper (Reverse Mode AD in a functional framework) that does it, which work by traversing the reflected program, is complicated, and is untyped.
We might be able to type it, but it will bring a huge source of complexity, and optimizing on reflection will not be easier then optimizing trace. So, we actually use a tracing based approach, which is very similar to (Demystifying Differentiable Programming), except we do not use continuation, only Mutation.
IMO as there is already effect everywhere (random, IO in reinforcement learning, mutation in NLP, distributed training) etc, the problem is less of ‘whether there should be effect or not’, and is more ‘how should we capture effect? Monad or Eff-like effect system or doesnt at all (as in OCaml/SML), only in static analysis?’ I do agree that it is a problem in it’s own right, but I think some notion of effect is inevitable.
Back to your particular problem, I think there is a ‘best of both world solution’. introduce a type Ref a. It mean a pointer pointing to a, which can change it’s content. the pointer cannot change what it point to though (albeit it can be achieved with Ref(Ref a)). There is 3 function on Ref. MkRef : a -> Ref a GetRef : Ref a -> a SetRef : Ref a -> a -> () and possibly, updateRef : Ref a -> (a -> a) -> (), which is atomic. introduce effectless list/dict. translate python list into Ref(List a). in the compiler, add special hook for Ref(List a), and use custom mutable datastructure. we can also change list a to mutable one(in compiler) if it is not being shared.
I think this address the (1, 2) in solution A, and the previous paragraph address (1) in solution B.
Let’s talk about (2) B. I do agree that reference hinder optimization. However, so does reflection - which is the only other way for higher order reverse mode differentiation on higher order function. I also postulate that with constant folding, the reference can be optimized away when the control flow is known. It will only exists at the boundary of unknown function call. If some variable are only used locally, never leaked outside, and their usage does not vary to the control flow, they should not generate Ref.
Of course, it is only a postulation at this point, but we also has a first order reverse mode automatic differentiation algorithm implemented, with no wengert list at runtime. The down side is that it does not work with control flow. We can always add special case to make sure no Ref is generated here, to achieve better speed.
Also IMHO we are pondering into the future too far ahead. AFAIK No one know how will reference, data-structure, tensor, ad, play together, when we try to compile efficient code on GPU. I think we should hold such design decision until much later phase, when we have a clearer picture.
Does I address your question?