Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[RFC] NNVMv2 IR - Relay

See original GitHub issue

[RFC]: Relay a new high level IR for TVM

Relay is a new high level intermediate representation (IR) intended to act as v2.0 of NNVM.

Motivation

Computation graphs are a powerful program representation as demonstrated by the first generation of DL frameworks. Most popular frameworks have employed computation graphs as their input, intermediate representation, and execution data structure.

However, as workloads continue to evolve, the design of our high level IRs needs to evolve to better support the needs of developers and users

Graph-level challenges such as control flow and sub-graphs have become necessary features to natively support and optimize.

The tight coupling between runtime representation and compile-time representation has limited flexibility and frustrated developers; Relay will decouple the representations.

Finally we believe the high level must be designed in tandem with the low level IR, allowing for the two layers to communicate during compilation to achieve optimal performance.

Design

The first version of NNVM set out to solve some of these challenges, and we view Relay as second generation IR designed specifically for integration into the TVM stack as the input layer. Our goal is to focus on TVM as our primary backend, easing development and maintenance for both TVM developers and current NNVM users, as well as enabling new features.

In order to address the challenges presented above we designed Relay to build on the things computation graphs are good at (pure, dataflow, compositional), and improve on the things they struggle with (control flow, subgraph, runtime/compilation distinction).

Core IR

Relay is a typed pure functional IR, with a few basic features such as functions, if-then-else control flow, recursion, operator and function calls, and variable binding.

We have iterated on Relay’s design over the past 8 months. This versions represents the culmination of our experiments. This PR does not contain all the pieces of the previous version, instead we focus on introducing the core IR, its associated data structures, and a few integral passes.

The core IR is defined in just a few files:

include/tvm/relay/base.h (the base classes and common data)
include/tvm/relay/type.h (the type system and all relevant nodes)
include/tvm/relay/expr.h (the expression language)

Typing

All Relay programs are typed, similar to more conventional languages such as C++. A type system allows us to statically (i.e at compile time) distinguish between different sorts of values. This means we know whether an expression will evaluate to a tensor, a function (i.e (float32, float32) -> float32) or a tuple (float32, int32). Furthermore, our type system has the ability to be shape generic (i.e polymorphism, templating).

Type inference and checking take the place of shape inference in traditional computation graphs style IRs.

This PR implements type inference and checking for Relay, the code can be found in src/tvm/relay/pass/type_infer.cc, and relevant helper utilities in src/tvm/relay/pass.

Control Flow

Relay adds a notion of control flow to the IR, in the form of simple if (cond) { true_branch } else { false_branch}. Relay requires that the condition variable computes a single boolean value controlling which branch is taken. if is an expression in Relay, meaning the result of the entire expression is the result of the branch taken.

We introduce this to add a formal way to distinguish between data flow and control flow without having to conflate the two in the representation. Because we separate the control signal, we can easily batch a program without affecting control flow.

The definition of control flow can be found in include/tvm/relay/expr.h.

Abstraction

Relay supports the definition of functions which can be used to represent “sub-graphs” (i.e chunks of reusable computation).

Relay functions are like traditional functions: they represent some set of parameters (i.e placeholders) and a body which is a chunk of computation involving the the parameters (i.e sub-graph). We can build a full network/model by composing together functions.

Compilation

The Relay IR is designed as a compile time representation of models. The new features are exposed only in Relay’s abstract syntax tree, and used for compile time program manipulation. We do not intend to use Relay’s IR as a data structure for serious interpretation or execution.

Runtime

These new features increase the expressivity of the current computation model, and one may ask how to execute programs using these features with the existing runtime. Our goal is to introduce Relay as the compiler representation in this PR, and reuse the existing runtime maintaining compatibility on both the frontend and backend. We anticipate a new version of the runtime having native support for Relay’s new constructs in the future.

TVM Co-design

We made an effort to model Relay’s implementation after TVM and reuse much of the existing infrastructure in order to provide better compatibility between TOPI operators and Relay programs. One big design decision is reusing the TVM node system to expose the Relay language to Python in the style of TVM. Users who are familiar with TVM’s expression language should feel comfortable working with the Relay AST’s definition in C++, and Python. We also share representations for many data structures. For example tensor containers (i.e tvm::runtime::NDArray), and generic attributes which can be shared between Relay and TVM are two such shared structures.

Transitioning from NNVM

We plan on adding a guide for transitioning programs from NNVM to Relay. This is one of the remaining work items before releasing the Relay Alpha. The goal is users can use the Relay operators and builder API to construct Relay programs, and we will follow-up with a compatibility layer to make transitioning from NNVM smooth.

For an implementation see #1672 which implements this bit.

Issue Analytics

State:
Created 5 years ago
Reactions:18
Comments:75 (72 by maintainers)

Top GitHub Comments

6reactions

tqchencommented, Aug 29, 2018

Thanks @jroesch for the proposal, I am going to elaborate some of my take on this proposal. Note that no design is perfect and that is why we need help of everyone to work together the evolve the IR.

Specific Technical Points

Tight integration with TVM node system. NNVM was designed pre-tvm so we did not put tvm runtime into consideration. This makes registering python callback, traversing the IR and interaction hard for the current nnvm. Relay refactor directly bring this tight integration, now every IR can be visited, inspected and we can do prototyping in both python and c++ easily. This is a feature that has nothing to do with the IR spec but never the less very significant for developers.
Shape/dtype integrated as TensorType, reuse symbolic integer(tvm::Expr). One advantage of TOPI description is that it is possible to support symbolic integer in many cases, so the code can be generic on one dimension(e.g. batch), the TensorType also reuses this. This makes things consistent with TOPI and allows declaration of specific programs like input size (n, 128, 128)
Control flows, if, for(via tail recursion) and function recursion.
Separation of compiler IR and runtime. This is again good for two reasons:
- We do not have to put too many consideration of compiler into runtime and keep runtime minimum
- We can keep the current graph runtime.

Some Possible Point to Discuss

These are things that pops up from my head, feel free to add more.

What need to be clarified, please say so since we need to make it accessible
How to support a specific pass, and how easy/hard it is in the current proposal
Specific use case scenario(e.g. transformer) and what things we need
What helps to constitute a minimum runtime for inference
Any considerations we need to build a JIT runtime for training.

4reactions

MarisaKirisamecommented, Aug 30, 2018

@junrushao1994 I am the main designer/implementer of relay’s automatic differentiation system.

In general, doing non-tracing based reverse mode automatic differentiation on arbitrary lambda is extremely hard. There is only one paper (Reverse Mode AD in a functional framework) that does it, which work by traversing the reflected program, is complicated, and is untyped.

We might be able to type it, but it will bring a huge source of complexity, and optimizing on reflection will not be easier then optimizing trace. So, we actually use a tracing based approach, which is very similar to (Demystifying Differentiable Programming), except we do not use continuation, only Mutation.

IMO as there is already effect everywhere (random, IO in reinforcement learning, mutation in NLP, distributed training) etc, the problem is less of ‘whether there should be effect or not’, and is more ‘how should we capture effect? Monad or Eff-like effect system or doesnt at all (as in OCaml/SML), only in static analysis?’ I do agree that it is a problem in it’s own right, but I think some notion of effect is inevitable.

Back to your particular problem, I think there is a ‘best of both world solution’. introduce a type Ref a. It mean a pointer pointing to a, which can change it’s content. the pointer cannot change what it point to though (albeit it can be achieved with Ref(Ref a)). There is 3 function on Ref. MkRef : a -> Ref a GetRef : Ref a -> a SetRef : Ref a -> a -> () and possibly, updateRef : Ref a -> (a -> a) -> (), which is atomic. introduce effectless list/dict. translate python list into Ref(List a). in the compiler, add special hook for Ref(List a), and use custom mutable datastructure. we can also change list a to mutable one(in compiler) if it is not being shared.

I think this address the (1, 2) in solution A, and the previous paragraph address (1) in solution B.

Let’s talk about (2) B. I do agree that reference hinder optimization. However, so does reflection - which is the only other way for higher order reverse mode differentiation on higher order function. I also postulate that with constant folding, the reference can be optimized away when the control flow is known. It will only exists at the boundary of unknown function call. If some variable are only used locally, never leaked outside, and their usage does not vary to the control flow, they should not generate Ref.

Of course, it is only a postulation at this point, but we also has a first order reverse mode automatic differentiation algorithm implemented, with no wengert list at runtime. The down side is that it does not work with control flow. We can always add special case to make sure no Ref is generated here, to achieve better speed.

Also IMHO we are pondering into the future too far ahead. AFAIK No one know how will reference, data-structure, tensor, ad, play together, when we try to compile efficient code on GPU. I think we should hold such design decision until much later phase, when we have a clearer picture.

Does I address your question?