Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TVM for ROCm 2.x is currently not working

See original GitHub issue

Environment: Ubuntu 18.04 + ROCm 2.2 + TVM (built from current master with ROCM = ON)

I ensure the target TVM library successfully detect and link with ROCM, and the tuning procedure runs successfully, however, while executing tvm.build(s, arg_bufs, 'rocm', name='matmul'), it failed with the following error:

WARNING:autotvm:Too many errors happen in the tuning. Now is in debug mode
Finish loading 500 records
DEBUG:autotvm:Finish loading 500 records
Cannot find config for target=rocm, workload=('tvm_matmul_tune_op', 4, 256, 256). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=rocm, workload=('tvm_matmul_tune_op', 4, 256, 256). A fallback configuration is used, which may bring great performance regression.

Best config:
,None,None
[14:47:54] /host/docker/matmul_tvm/tvm/src/pass/vectorize_loop.cc:362: Detect vector condition in Vectorized Loop, scalarizing...
[14:47:54] /host/docker/matmul_tvm/tvm/src/pass/vectorize_loop.cc:362: Detect vector condition in Vectorized Loop, scalarizing...
Traceback (most recent call last):
  File "matmul_autotvm.py", line 260, in <module>
    search_matmul_config(4, 256, 256, 500) # m, k, n, num_trials
  File "matmul_autotvm.py", line 165, in search_matmul_config
    func = tvm.build(s, arg_bufs, 'rocm', name='matmul')
  File "/host/docker/matmul_tvm/tvm/python/tvm/build_module.py", line 617, in build
    fhost, mdev = _build_for_device(flist, tar, target_host)
  File "/host/docker/matmul_tvm/tvm/python/tvm/build_module.py", line 484, in _build_for_device
    mdev = codegen.build_module(fdevice, str(target)) if fdevice else None
  File "/host/docker/matmul_tvm/tvm/python/tvm/codegen.py", line 36, in build_module
    return _Build(lowered_func, target)
  File "/host/docker/matmul_tvm/tvm/python/tvm/_ffi/_ctypes/function.py", line 206, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (2) /host/docker/matmul_tvm/tvm/build_rocm/libtvm.so(TVMFuncCall+0x61) [0x7f9598de3f01]
  [bt] (1) /host/docker/matmul_tvm/tvm/build_rocm/libtvm.so(+0x14b2e9) [0x7f95986992e9]
  [bt] (0) /host/docker/matmul_tvm/tvm/build_rocm/libtvm.so(+0x231aaa) [0x7f959877faaa]
  File "/host/docker/matmul_tvm/tvm/src/codegen/codegen.cc", line 46
TVMError: Check failed: bf != nullptr: Target rocm is not enabled

Issue Analytics

State:
Created 4 years ago
Comments:48 (45 by maintainers)

Top GitHub Comments

1reaction

masahicommented, May 5, 2019

This is not clean, but you can modify this block for rocm target.

 if 'cuda' in self.task.target.keys or 'opencl' in self.task.target.keys:
            remote = request_remote(self.key, self.host, self.port)
            ctx = remote.context(str(self.task.target), 0)
            max_dims = ctx.max_thread_dimensions
            kwargs['check_gpu'] = {
                'max_shared_memory_per_block': ctx.max_shared_memory_per_block,
                'max_threads_per_block': ctx.max_threads_per_block,
                'max_thread_x': max_dims[0],
                'max_thread_y': max_dims[1],
                'max_thread_z': max_dims[2],
            }

For rocm, max_shared_memory_per_block should be 48KB, and max threads per block should be 256. Don’t forget to add “if ‘rocm’ in self.task.target.keys”.

1reaction

fundamatcommented, Apr 25, 2019

@masahi

You can get more information at here: https://llvm.org/docs/AMDGPUUsage.html#code-object-metadata

And I just simply add the -mattr=-code-object-v3 in BuildAMDGPU at codegen_amdgpu.cc:182

config << "-mtriple=amdgcn-amd-amdhsa-hcc -mcpu=gfx"
       << DetectROCMComputeVersion(target) << " -mattr=-code-object-v3 "
       << target.substr(4, target.length() - 4);