question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[RFC][Relay] Dynamic Dimensions

See original GitHub issue

Supporting Dynamic Dimensions

I recently opened an RFC proposing a new dynamic runtime (see #2810). A missing piece of the puzzle for supporting fully dynamic models is typing, and code generation for tensors with statically unknown shapes.

There are three critical steps to supporting dynamic tensors which I believe are distinct tasks:

  • Extending the type system
  • Computing allocation sizes
  • Code generation for dynamic shape

First we should consider a simple example which exhibits the hardest version of this problem, how do we type such a function:

operator arange(%start: int32, %stop: int32, %step: int32) -> Tensor((Any,), int32)

fn (%i: int32) {
  let seq = arange(0, %i, 1);
  sum(seq);
}

This example is challenging to type check due to arange’s output shape depending on the value of %i. Currently we are able to statically determine the output shape if it is a function of the input’s shape, but not if the output shape is a function of the input itself.

*Note: we could use full dependent typing here, but this introduces new challenges. *

In the cases where the output shape is symbolic function of the input, or fully dynamic (such as arange) we must generate code which computes the precise size of the output buffer for the return value at execution time.

For example even when if we know f: (Tensor<n>, Tensor<m>) -> Tensor<n + 1, m> we must generate a function which can allocate space based on the runtime values of n, m. These cases are rare because most programs are closed program we know all shapes statically, but once we introduce a dynamic shape, it is possible we must defer arbitrary shape computation until runtime.

Extending the type system

The first proposed change is finish support for symbolic shapes.

For example f can be specialized to concrete shapes, and users can pass around functions and operations which work over symbolic shapes.

The second change is support for an Any shape.

Any represents a tensor dimension which is not statically known.

One possible design is to use a sub-shaping rule. This rule would be similar to the traditional subtyping rule used in popular programming languages.

For example in C++:

class T { ... };
class U : public T { ... };

We know U <: T (that is U is a subtype of T).

In this version of Relay this would mean (1, 2, 3) <: (1, Any, 3).

Unfortunately this typing rule is not what we want. For example if we have a function which expects a tensor of shape (1, 2, 3) we can no longer call it with a tensor of type (1, Any, 3) we must introduce explicit casting to properly check the runtime type.

Instead we propose borrowing ideas from gradual typing, and introducing the notion of “gradual shaping”. Any occurence of a dynamic dimension is potentially unknown at compile time, and when we demand a concrete dimension we must dynamically check. This introduces the ability for runtime shape mismatches but ensures that dynamically shaped and statically shaped code can interact gracefully.

This is important, for example in many cases Any would pollute the types of functions that are static, but their argument has an unknown shape.

For example, we can optimize the below code under the assumption that it will always have shape (10, 10), and then perform one runtime check of shape when calling f.

fn @f(%x : Tensor[(10, 10)]) { ... }

let x: Tensor(Any, Any) = ...;
@f(x)

Computing allocation sizes

The second component is computing the allocation size required. We must due this because TVM requires its output buffers to be preallocated. When the output is dependent on input, or a symbolic relationship which has not been specialized we need to generate a function at runtime which computes the output buffer size.

Array<IndexExpr> ShapeFunc(const Array<Input>& inputs) {
  ...
}

We can then invoke this function to allocate the appropriate output storage, during execution.

Code generation for dynamic shape

The final challenge is performing code generation, this RFC only proposes solving the representation problem and generating naive TVM code in these cases which use fully symbolic shapes which are not known until runtime.

I believe there are a variety of different code-generation strategies which each may suit different situations, and we implement a mechanism for specifying the code generation policy in the face of dynamic shapes seperately from shape functions, and typing of dynamic shapes.

Some of the possible code generation strategies:

  • Generate placeholder code when symbolic relationship holds.
  • Generate placeholder code and make use of shape functions.
  • Implement bucketing style approach to code generation.
  • Your ideas here!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:9
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

5reactions
jroeschcommented, May 21, 2019

@ajtulloch @icemelon9 and I have been quietly hacking on a prototype the last few months but have been busy with other things (such as VM 😄 ). We are going to start pushing now, I opened a draft PR which will contain type checking changes, and we will follow-up with code generation next.

One thing would be to help validate and criticize the design from the standpoint of a user who is deploying models day to day. We are also generally interested in thoughts about code generation strategies.

One change that we have decided on is adding the ability to type hint when generating IR.

When generating IR from frameworks we don’t necessarily have all type information up front. That is you may know a static shape for a value, but the shape is not necessarily a stable static property.

For example imagine a loop variable which starts as a (1, 1) matrix, but will grow by one in the first dimension on every loop. Eventually the matrix will have the shape (n, 1) where n is the number of loop iterations but we can’t know that statically.

Our current solution is to generate IR with unknown types, plus a type hint, allowing inference to eventually solve the most general type.

For example if we were to generate IR for a loop we can do:

def my_loop(%x: ?T, %y: ?U) {
     ...
     can_unify(%x, shape=(1, 1))
     my_loop(concat([%x, %x], axis=0))
     ...
}

These could later be used to inform code generation, but ensure that we get the correct typing.

I will also be at the Diff. Prog. Conf hosted by FB if you are around to talk more in person.

1reaction
ajtullochcommented, May 17, 2019

This looks like a great design, and should handle all the applications I’m currently thinking of. @jroesch let us know if we can help.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Re: [apache/incubator-tvm] [RFC][Relay] Dynamic Dimensions (#3042)
Re: [apache/incubator-tvm] [RFC][Relay] Dynamic Dimensions (#3042) ... The any dimension feature has landed into the mainline
Read more >
[RFC] Relay Dynamic Runtime #2810 - apache/tvm - GitHub
In this case code like loops which release, and re-request an identical size block each iteration will end up reusing the same block....
Read more >
RFC 6225 - Dynamic Host Configuration Protocol Options for ...
Dynamic Host Configuration Protocol Options for Coordinate-Based Location Configuration Information (RFC 6225, July 2011)
Read more >
Dynamic Ops in Relay - pre-RFC - Apache TVM Discuss
Two possible approaches are A0, which involves merging dynamic and ... Currently Relay allows to use symbolic var to represent a dimension.
Read more >
RFC 6733: Diameter Base Protocol
1. Relay Agents Relay agents are Diameter agents that accept requests and route messages to other Diameter nodes based on information found in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found