question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[RFC][AUTOTVM] Auto-Schedule from Compute Declaration

See original GitHub issue

Update(Dec. 25, 2020): This RFC is deprecated. We started another project “Ansor” to bring auto-scheduler for TVM. Ansor is integrated as tvm.auto_scheduler package in the current code base. You can see the new RFC and tutorials.

Auto-Scheduler

TVM decouples kernel implementation into compute and schedule. The compute part is a friendly DSL that can describe algorithms intuitively. However, the schedule part still requires strong expert knowledge and time-consuming tuning to provide decent performance. The tuning process is partially automated by the existing autotvm package, but a human-engineered template is still required.

This RFC proposes a “real” autotvm, which we can call auto scheduler. It aims at removing all human efforts on the schedule part.

Proposed Design

The auto-scheduler is built on the existing autotvm package. It will generate a template from compute declaration. Then this template can either be

  • Statically filled by heuristic rules and cost functions to provide reasonable performance, or
  • Dynamically tuned by autotvm to provide better performance with some time budget

The auto-scheduler takes a computation graph described by tvm DSL as input, then classify the type of read/write patterns and the type of computation. It dispatches the nodes in the DAG to different “meta templates”. The “meta templates” generates autotvm templates from the compute declaration. There are four types of meta templates : simple reduction, complex reduction, direct compute, and location-tunable compute. The auto-scheduler will do parallelization, vectorization, tiling, and operator fusion.

The code is available on my branch. The current implementation is in pure python bacuse autotvm is mainly written in python. But move the whole autotvm package to c++ is within long-term plan. The code is organized as follows.

API

There are only two user-oriented API calls

  • autotvm.AutoSchedulerOptions(**kwargs) This is used to configure the auto scheduler. The arguments include hardware configurations(vector lanes, number of threads, size of shared memory, etc) and tuning configurations (how many tuning knobs to generate).
  • autotvm.create_schedule(tensors) This is similar to tvm.create_schedule, but returns an already optimized schedule.
A = tvm.placeholder((128,), name='A')
B = tvm.placeholder((128,), name='B')
C = tvm.compute((128,),  lambda i: A[i] + B[i] * 2)

with tvm.target.create('llvm'):
    with autotvm.AutoSchedulerOptions(vec_size=8, num_threads=16):
        s, bufs = autotvm.create_schedule([A, B, C])

# NO SCHEDULE REQUIRED

func = tvm.build(s, bufs)

Examples

  1. Tutorial This is a tutorial on how to statically use the auto-scheduler or auto-tune it.
  2. Schedule a whole network This example is adopted from #2498. It is a LeNet like convolution neural network written purely by tvm (without graph IR). The auto-scheduler also provides basic operator fusion for it. Right now we can only run forward pass. I am working on fixing the backward pass.

Performance

One reachable performance goal is to replace more than 90% schedule code in existing TOPI by this auto-scheduler. I haven’t done the experiments, but I believe the generated templates can cover the existing search space for most operators (includes conv2d, reduction, …).

Another part of the goal is to provide reasonable static performance. In the “Schedule a whole network” example, for batched forward pass, the current performance is 1.2x slower than out-of-the-box TF + Keras, and 10x faster than naive schedule (fuse and parallel outer loops) on an Intel i7-8750H. For static usage, the input of the auto-scheduler are parameters for heuristic rules and hardware configurations. We will gather all inputs into a global config, so users can still do some quick “tuning”.

Todo List

  • Performance test and improvement to cover more than 90% schedule code in TOPI Improve the heuristic rules to provide better static performance, do tests to make sure we cover the search space of existing templates.
  • Improve tuning speed The current implementation does analysis and generates the template on the fly, which is expensive and redundant during batched tuning. We should decouple the template generation and template tuning, and explicitly cache the template.
  • (long-term) Move all autotvm related code to c++
  • Improve loop partition to better handle partial tile, vectorization.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:35
  • Comments:20 (17 by maintainers)

github_iconTop GitHub Comments

2reactions
jroeschcommented, Apr 3, 2019

@merrymercy how much work is there per backend? looking over the code now will follow up with more questions later.

1reaction
eqycommented, Apr 5, 2019

@merrymercy Do you think that this is a good time to also make schedules serializable/package them with autotvm style configs? In the past we have had issues where we did not want to merge in changes to schedules because they would break compatibility with tophub, and now it seems that the variety of schedules may also change quickly as auto-schedule is changed. Instead of forcing schedules to be schedule, we can maybe side-step this by packaging schedules together with autotvm configs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[RFC][AUTOTVM] Auto-Schedule from Compute Declaration
It aims at removing all human efforts on the schedule part. Proposed Design. The auto-scheduler is built on the existing autotvm package. It ......
Read more >
Auto-scheduling a Neural Network for x86 CPU - Apache TVM
In other words, the auto-scheduler only uses the compute declarations in tvm/python/topi and does not use existing schedule templates. Note that this tutorial ......
Read more >
How Project schedules tasks: Behind the scenes
To find the Project start date or to change it to another date, click Project, and then click Project Information. Of course, there...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found