question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[RFC] Symbolic shape runtime

See original GitHub issue

Problem: In real world workload, not everything is static. We may want to inference image in different size, or batch size. In some work load, we may need to contact different number of embedding for each instance.

Design: Introduce upper_bound idea for memory allocation, use different view for different inputs. For example, the inference input shape is [n, 3, 224, 224], we will give hint of n’s upper bound 5, then first time memory allocation is based on [5, 3, 224, 224], but we may create view of [1, 3, 224, 224], [3, 3, 224, 224] during running. In this way we are using some memory to trade of dynamic.

Implementation:

  1. During memory planing, given a map of {var : upper_bound}, memory planing process creates two columns: [storage_id, max_bytes] for runtime setup.

  2. The shape in graph json is no longer a int, but an infix string expression of variable and constant.

  3. When setup runtime, storage pool is set up by [storage_id, max_bytes] columns. The infix expression of shape is converted to postfix expression for fast evaluation. When variable in shape changes, the runtime will run evaluation of all expressions contains variable, then update view if this view is not in cache.

PR: https://github.com/dmlc/tvm/pull/2447

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:17 (17 by maintainers)

github_iconTop GitHub Comments

1reaction
antinucleoncommented, Feb 1, 2019

Let’s decompose this problem.

In order to support dynamic shape, we have to fix two problem:

  1. Memory management.

  2. Kernel support.

Memory management will determine whether a program is able to run or not, then may determine efficiency of the runtime. However Kernel support only determines efficiency.

If you agree with this, we can future decompose what blocks us: memory management.

There are two ways to handle dynamic memory: static allocation and create many views on static memory, or do it in dynamic, where we can call it AOT or JIT. Ideally, if we have a very powerful allocator, two ways are equivalent.

In this RFC we are going to use AOT approach. To know how much memory we need to allocate for each storage pool node, we also have two approaches: 1. Give a hint, then get a number 2. Pre compute expression of each pool node size. To quickly get it work this PR is using a max_bytes hint to get a fixed size, then create view. Note if we can get expression of storage node, JIT allocator will be much easier too.

At this point, we already have solutions to deal with memory management. Let’s come back to efficiency problem: kernels.

  1. Assume we are doing tensorization + vectorization + loop partition, we will have a good default kernel (most of human code today is written in this way).

  2. Assume for each work load, we have different kernel. If during compile we can have a map between the compiled function, we can handle it during create views. (see view_cache_)

----------------------------This is a separation line------------------------

End to end changes:

Python API:

b1 = tvm.var("n1")
b2 = tvm.var("n2")
b3 = tvm.var("n3")

shape_var_bounds = {
    b1 : 128,
    b2 : 32,
    b3 : 12
}



x = relay.var("x",
              shape=[b1,
                     4,
                     2, 3], dtype="float32")


y = relay.var("y",
              shape=[b2,
                     4,
                     2, 3], dtype="float32")

z = relay.op.tensor.concatenate([x, y], axis=0)


a = relay.var("a",
              shape=[b3,
                     4,
                     2, 3], dtype="float32")
b = relay.var("b",
              shape=[27,
                     4,
                     2, 3], dtype="float32")
c = relay.op.tensor.concatenate([a, b], axis=0)

out = relay.op.tensor.concatenate([z, c], axis=0)

func = relay.Function([x, y, a, b], out)


graph, mod, param = relay.build_module.build(func, target="llvm", shape_var_bounds=shape_var_bounds)

print(graph)

rt = tvm.contrib.graph_runtime.create(graph, mod, tvm.cpu())

In graph json, three attributes are added:

"symbolic_shape": 1,
"max_bytes": [
      "list_int",
      [
        12288,
        3072,
        1152,
        2592,
        19104
      ]
    ],
    "var_upper_bound": {
      "n1": 128,
      "n2": 32,
      "n3": 12
    },

In runtime, if symbolic_shape is true, memory allocation will according to max_bytes field. When shape variables changes, new view will be created.

This change is good for dynamic batch, as long as kernel loop on batch axis is not significantly slower.

0reactions
tqchencommented, Oct 8, 2019

superceded by relay vm solution, close for now

Read more comments on GitHub >

github_iconTop Results From Across the Web

[RFC] Symbolic shape runtime #2451 - apache/tvm - GitHub
In this way we are using some memory to trade of dynamic. Implementation: During memory planing, given a map of {var : upper_bound}...
Read more >
[RFC] Symbolic Shape Analysis - MLIR - LLVM Discourse
Run shape-reification pass on the created auxiliary functions. Analyze based on the auxiliary functions.
Read more >
Re: [dmlc/tvm] [RFC] Symbolic shape runtime (#2451)
superceded by relay vm solution, close for now -- You are receiving this because you are subscribed to this thread. Reply to this...
Read more >
Dynamic Ops in Relay - pre-RFC - Apache TVM Discuss
To limit the impact to runtimes, we'd like to propose to features around dynamic shapes: A compile time check to ensure we only...
Read more >
State of symbolic shapes branch - PyTorch Dev Discussions
This is because Executorch was translating SymInt JIT type into SymInt C++ type, but the Executorch runtime does not (and should not) support...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found