Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[RFC][Relay][HalideIR] Automatically generate the AST

See original GitHub issue

I have begun to experiment with writing a new library called astgen to replace the large quantity of boilerplate required by the AST today, and enable us to more flexibly evolve the node system, and its APIs.

The first version of this tool will take a Python file like this:

import astgen
import tvm

class Expr:
    pass

@astgen.astgen
class Constant(Expr):
    """
    \\brief Constant tensor, backed by an NDArray on the cpu(0) device.
    \\note Scalar constants are represented by rank-0 constant tensors,
           enabling uniform constant folding over scalars and tensors.
    """

    """The data of the tensor."""
    data: tvm.ndarray.NDArray

astgen.generate_all("expr.h", "tvm::relay")

and produce this C++ file:

namespace tvm {
namespace relay {

/*!
 * \brief Constant tensor, backed by an NDArray on the cpu(0) device.
 * \note Scalar constants are represented by rank-0 constant tensors,
 * enabling uniform constant folding over scalars and tensors.
 *
 */
class Constant;

/*!
 * \brief Constant container.
 *
 */
class ConstantNode : public ExprNode {
 public:
  void VisitAttrs(tvm::AttrVisitor* v) final {
    v->Visit("data", &data);
  }
  TVM_DLL static Constant make(runtime::NDArray data);

  static constexpr const char* _type_key = "relay.Constant";
  TVM_DECLARE_NODE_TYPE_INFO(ConstantNode, ExprNode);
};
}
RELAY_DEFINE_NODE_REF(Constant, ConstantNode, Expr);

} // relay
} // tvm

This compliments Tianqi’s recent proposal to evolve the low level IR see #3474.

Specifically by not hand writing all AST code, we should be able to flexibly change representation without requiring extensive refactors, and make unifying the IRs of TVM less effort as time goes on.

A secondary goal of mine is to allow any language with a C ABI compatible FFI to construct and manipulate TVM ASTs.

By supporting this we could allow users to build tools in languages of choice without having to change how we develop the core of TVM.

Furthermore this will improve Python interop. as we will no longer have to deal with hidden C++ fields as is the case today.

Unfortunately we have heavily relied on C++ objects, and C++ datatypes such as std::string and resolving these are essential to provide an FFI friendly AST.

I hope the community can help come up with a design for Relay’s AST using a code generation based approach.

My goal is to first replace the AST today with little to no changes, and then incrementally evolve it over time.

I will follow up with more details on my proposed solutions over the next few days.

See this branch for more details: https://github.com/jroesch/tvm/tree/astgen.

Issue Analytics

State:
Created 4 years ago
Reactions:8
Comments:5 (5 by maintainers)

Top GitHub Comments

2reactions

tqchencommented, Jul 6, 2019

cc @jermainewang @kazimuth @junrushao1994 @icemelon9 @ajtulloch @yzhliu @merrymercy who might be interested in this. Some initial thoughts:

Convention of the name convention the class and file hierarchy schema - e.g. tvm.schema.expr.py -> include/IR/expr.h - Alternatively, allow declaration within each file.
Decouple schema reading(frontend) and emission, have an IR of class hierarchy schemas, so that we can have different emitters
- Think about python emitter, c++ emitter
We might want to use it to also deal with general objects, including runtime::Object in VM.
We still want to allow some user-written boilerplate, given that C++ datatypes can still be used in many of those and we would love to have them for certain internal data types.
How to handle docstrings
- ATM the python docstrings are separately written, by manual wrapping. The benefit of manual wrapping is the clear docstrings(as they might be different from those in c++).
- Should we do the same for now and only generate c++ code?

1reaction

werecommented, Jul 26, 2019

Yeah, I strongly agree with the point that we need to decouple schema reading and the generation.

This is somehow like LLVM’s tablegen, which manages repeat and regular codes in a centralized description file to minimize the changes we need to add new IR nodes.