[RFC] Symbolic shape runtime
See original GitHub issueProblem: In real world workload, not everything is static. We may want to inference image in different size, or batch size. In some work load, we may need to contact different number of embedding for each instance.
Design: Introduce upper_bound
idea for memory allocation, use different view for different inputs. For example, the inference input shape is [n, 3, 224, 224], we will give hint of n’s upper bound 5, then first time memory allocation is based on [5, 3, 224, 224], but we may create view of [1, 3, 224, 224], [3, 3, 224, 224] during running. In this way we are using some memory to trade of dynamic.
Implementation:
-
During memory planing, given a map of
{var : upper_bound}
, memory planing process creates two columns:[storage_id, max_bytes]
for runtime setup. -
The shape in graph json is no longer a int, but an infix string expression of variable and constant.
-
When setup runtime, storage pool is set up by
[storage_id, max_bytes]
columns. The infix expression of shape is converted to postfix expression for fast evaluation. When variable in shape changes, the runtime will run evaluation of all expressions contains variable, then update view if this view is not in cache.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:17 (17 by maintainers)
Top GitHub Comments
Let’s decompose this problem.
In order to support dynamic shape, we have to fix two problem:
Memory management.
Kernel support.
Memory management will determine whether a program is able to run or not, then may determine efficiency of the runtime. However Kernel support only determines efficiency.
If you agree with this, we can future decompose what blocks us: memory management.
There are two ways to handle dynamic memory: static allocation and create many views on static memory, or do it in dynamic, where we can call it AOT or JIT. Ideally, if we have a very powerful allocator, two ways are equivalent.
In this RFC we are going to use AOT approach. To know how much memory we need to allocate for each storage pool node, we also have two approaches: 1. Give a hint, then get a number 2. Pre compute expression of each pool node size. To quickly get it work this PR is using a
max_bytes
hint to get a fixed size, then create view. Note if we can get expression of storage node, JIT allocator will be much easier too.At this point, we already have solutions to deal with memory management. Let’s come back to efficiency problem: kernels.
Assume we are doing tensorization + vectorization + loop partition, we will have a good default kernel (most of human code today is written in this way).
Assume for each work load, we have different kernel. If during compile we can have a map between the compiled function, we can handle it during create views. (see
view_cache_
)----------------------------This is a separation line------------------------
End to end changes:
Python API:
In graph json, three attributes are added:
In runtime, if
symbolic_shape
is true, memory allocation will according tomax_bytes
field. When shape variables changes, new view will be created.This change is good for dynamic batch, as long as kernel loop on batch axis is not significantly slower.
superceded by relay vm solution, close for now