[RFC][EXPR] Formalize Integer Arithmetic Analysis
See original GitHub issueThe integer analysis (arith namespace) serves as a foundational tool for the low-level code generation. At this point, the analysis code is somewhat scattered in the codebase. This RFC proposes to revamp them.
Here is the list of currently related code:
- Symbolic integer simplification
- Canonical form based simplification(as in CanonicalSimplify)
- Rewrite-rule based simplification(Simplify in HalideIR)
- Symbolic interval bound analysis
- Modulo analysis(DetectLinearEquation)
- Deep equivalence checking
We have also seen issues on this topic.
- https://github.com/dmlc/tvm/issues/2455
- https://github.com/dmlc/tvm/issues/2418
- https://github.com/dmlc/tvm/issues/1711
- https://github.com/dmlc/tvm/issues/2173
Most of the current analysis functions can be put into two categories:
- Symbolic equivalent analysis: maps a symbolic expression e1 to e2 such that e2 and e1 are equivalent to each other (in the case of simplification)
- Relaxed integer set analysis: for e1, provide a set representation S, such that we ensure
e1 \subset S
.- For example, the result of DetectLinearEquation can be viewed as an integer set
{x * factor + base}
- For example, the result of DetectLinearEquation can be viewed as an integer set
Relaxed integer set analysis only requires us to find a superset of the corresponding set. Such relaxation allows us to deal with cases such as division or other functions.
Here is a list of observations about these analysis functions.
- They can use results from each other.
- For example, we can use the result of a modulo analysis when simplifying divisions.
- They can complement each other.
- Constant integer bound and symbolic bound can give us two different perspectives.
- They are context dependent
- We need the context information about(bound, positiveness and other properties) about variables in an expression to get the best analysis result.
Given these observations, the new API is a super analyzer that carries states about the context information, and bundles sub-analyzers.
Proposed sub-analyzers
- CanonicalSimplifier
- RewriteSimplifier
- Deep equality class(via possibly via union-find)
- Symbolic interval finder
- Constant integer bound
- LinearEquation detector(modulo analysis)
- Integer polynomial(optional, its special case of the linear equation should be sufficient)
- Make it easy to plug-in a “new-view” when we have a new analysis
Each sub-analyzer will use memoization to cache the intermediate result. Importantly, each sub-analyzer has a handle(weak reference) back to the super analyzer to allow them to call other sub-analyzers. A user needs to create a super analyzer, populates related context states, and run the analysis.
Followup Goals Enabled by This Proposal
There are several things that formalization can enable.
- Better context-dependent analysis, we can populate context info(e.g., the shape size is non-negative), and use the information in code generation.
- Constant integer bound analysis and automatic integer narrowing.
- So far we use int32 for most indices. Ideally, we want to be able to default to int64, and automatically narrows the integers to int32 when we detect all the arithmetic are within the bound.
- A centralized arithmetic logic that deals with integer analysis
- Improve simplification in general
Proposed Steps
- Design discussion and RFC comment(please reply to this thread).
- Pattern match utility to better port RewriteSimplification (https://github.com/dmlc/tvm/pull/2589)
- Scaffolding of Super Analyzer and sub analyzer structure (https://github.com/dmlc/tvm/pull/2668)
- Populating the new infra
- constant integer bound
- modular set
- RewriteSimplifier
- CanonicalSimplifier
- Migration the existing simplifier to the new stack Comments are welcomed.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:6
- Comments:12 (12 by maintainers)
Top GitHub Comments
https://github.com/dmlc/tvm/pull/3368 finishes the migration of the analysis stack. Thanks to everyone’s effort in helping reviews and contributing code to the new Analyzer.
In my opinion, analyses should be pure functions, something like
(Expr, Context) -> Expr
. This makes the problem of memoization much simpler: we just have to use both the arguments as a key.