question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CuPy JIT failure in ROCm

See original GitHub issue

Reproducible on ROCm 3.5.0 and 4.0.0:

$ pytest tests/cupyx_tests/jit_tests/
========================================================================= test session starts =========================================================================
platform linux -- Python 3.7.8, pytest-6.0.2, py-1.9.0, pluggy-0.13.1
rootdir: /home/leofang/dev/cupy_rocm350, configfile: setup.cfg
collected 9 items                                                                                                                                                     

tests/cupyx_tests/jit_tests/test_raw.py ....F....                                                                                                               [100%]

============================================================================== FAILURES ===============================================================================
_______________________________________________________________ TestRaw.test_raw_multidimensional_array _______________________________________________________________

self = <cupy.cuda.compiler._NVRTCProgram object at 0x7f66aff1fc50>
options = ('-D CUPY_JIT_MODE', '-I/home/leofang/dev/cupy_rocm350/cupy/_core/include', '-I/opt/rocm/include'), log_stream = None

    def compile(self, options=(), log_stream=None):
        try:
            if self.name_expressions:
                for ker in self.name_expressions:
                    nvrtc.addAddNameExpression(self.ptr, ker)
>           nvrtc.compileProgram(self.ptr, options)

cupy/cuda/compiler.py:623: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   cpdef compileProgram(intptr_t prog, options):

cupy_backends/cuda/libs/nvrtc.pyx:133: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   check_status(status)

cupy_backends/cuda/libs/nvrtc.pyx:145: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   raise NVRTCError(status)
E   cupy_backends.cuda.libs.nvrtc.NVRTCError: HIPRTC_ERROR_COMPILATION (6)

cupy_backends/cuda/libs/nvrtc.pyx:64: NVRTCError

During handling of the above exception, another exception occurred:

self = <cupyx_tests.jit_tests.test_raw.TestRaw testMethod=test_raw_multidimensional_array>

    def test_raw_multidimensional_array(self):
        @jit.rawkernel()
        def f(x, y, n_row, n_col):
            tid = jit.threadIdx.x + jit.blockDim.x * jit.blockIdx.x
            ntid = jit.blockDim.x * jit.gridDim.x
            size = n_row * n_col
            for i in range(tid, size, ntid):
                i_row = i // n_col
                i_col = i % n_col
                y[i_row, i_col] = x[i_row, i_col]
    
        n, m = numpy.uint32(12), numpy.uint32(13)
        x = testing.shaped_random((n, m), dtype=numpy.int32, seed=0)
        y = testing.shaped_random((n, m), dtype=numpy.int32, seed=1)
>       f((5,), (6,), (x, y, n, m))

tests/cupyx_tests/jit_tests/test_raw.py:59: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cupyx/jit/_interface.py:71: in __call__
    options=('-D CUPY_JIT_MODE',))
cupy/_core/core.pyx:1956: in cupy._core.core.compile_with_cache
    cpdef function.Module compile_with_cache(
cupy/_core/core.pyx:2021: in cupy._core.core.compile_with_cache
    return cuda.compile_with_cache(
cupy/cuda/compiler.py:430: in compile_with_cache
    name_expressions, log_stream, cache_in_memory)
cupy/cuda/compiler.py:813: in _compile_with_cache_hip
    log_stream, cache_in_memory)
cupy/cuda/compiler.py:272: in compile_using_nvrtc
    name_expressions, log_stream, jitify)
cupy/cuda/compiler.py:255: in _compile
    ptx, mapping = prog.compile(options, log_stream)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <cupy.cuda.compiler._NVRTCProgram object at 0x7f66aff1fc50>
options = ('-D CUPY_JIT_MODE', '-I/home/leofang/dev/cupy_rocm350/cupy/_core/include', '-I/opt/rocm/include'), log_stream = None

    def compile(self, options=(), log_stream=None):
        try:
            if self.name_expressions:
                for ker in self.name_expressions:
                    nvrtc.addAddNameExpression(self.ptr, ker)
            nvrtc.compileProgram(self.ptr, options)
            mapping = None
            if self.name_expressions:
                mapping = {}
                for ker in self.name_expressions:
                    mapping[ker] = nvrtc.getLoweredName(self.ptr, ker)
            if log_stream is not None:
                log_stream.write(nvrtc.getProgramLog(self.ptr))
            # TODO(leofang): use getCUBIN() for _cuda_version >= 11010?
            return nvrtc.getPTX(self.ptr), mapping
        except nvrtc.NVRTCError:
            log = nvrtc.getProgramLog(self.ptr)
            raise CompileException(log, self.src, self.name, options,
>                                  'nvrtc' if not runtime.is_hip else 'hiprtc')
E           cupy.cuda.compiler.CompileException: /tmp/comgr-d17f2a/input/CompileSource:5417:7: error: no member named '_indexing' in 'CArray<int, 2, true, true>'
E               y._indexing(thrust::make_tuple(i_row, i_col)) = x._indexing(thrust::make_tuple(i_row, i_col));
E               ~ ^
E           /tmp/comgr-d17f2a/input/CompileSource:5417:17: error: no member named 'make_tuple' in namespace 'thrust'; did you mean 'std::make_tuple'?
E               y._indexing(thrust::make_tuple(i_row, i_col)) = x._indexing(thrust::make_tuple(i_row, i_col));
E                           ^~~~~~~~~~~~~~~~~~
E                           std::make_tuple
E           /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/tuple:1448:5: note: 'std::make_tuple' declared here
E               make_tuple(_Elements&&... __args)
E               ^
E           /tmp/comgr-d17f2a/input/CompileSource:5417:55: error: no member named '_indexing' in 'CArray<int, 2, true, true>'
E               y._indexing(thrust::make_tuple(i_row, i_col)) = x._indexing(thrust::make_tuple(i_row, i_col));
E                                                               ~ ^
E           /tmp/comgr-d17f2a/input/CompileSource:5417:65: error: no member named 'make_tuple' in namespace 'thrust'; did you mean 'std::make_tuple'?
E               y._indexing(thrust::make_tuple(i_row, i_col)) = x._indexing(thrust::make_tuple(i_row, i_col));
E                                                                           ^~~~~~~~~~~~~~~~~~
E                                                                           std::make_tuple
E           /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/tuple:1448:5: note: 'std::make_tuple' declared here
E               make_tuple(_Elements&&... __args)
E               ^
E           4 errors generated when compiling for gfx906.
E           Error: Failed to compile opencl source (from CL or HIP source to LLVM IR).

cupy/cuda/compiler.py:636: CompileException
======================================================================= short test summary info =======================================================================
FAILED tests/cupyx_tests/jit_tests/test_raw.py::TestRaw::test_raw_multidimensional_array - cupy.cuda.compiler.CompileException: /tmp/comgr-d17f2a/input/CompileSourc...
===================================================================== 1 failed, 8 passed in 8.76s =====================================================================

There are multiple problems:

  1. hipRTC apparently does not recognize -D, so any macros remain undefined:
import cupy as cp


code = r'''

extern "C" __global__ void xyz(float* a) {
    float x = 0;
    #ifdef CUPY_JIT_MODE
    x = 1;
    #else
    x = 2;
    #endif
    a[threadIdx.x] = x;
}
'''

options = ('-DCUPY_JIT_MODE',)
ker = cp.RawKernel(code, 'xyz', options=options, backend='nvrtc')
a = cp.empty((32,), dtype=cp.float32)
ker((1,), (32,), (a,))
print(a)  # -> with backend='nvrtc': [2., 2., ...]; with backend='nvcc': [1., 1., ...]
cp.cuda.Device().synchronize()
  1. The headers cupy/tuple.cuh are not manually unrolled: they should be added to the extra_sources list in cupy/_core/core.pyx
  2. thrust::swap implementation is not recognized: Actually I don’t know how CUDA tests passed, because apparently we don’t include it in the bundled headers. I think hiprtc/hipcc is correct in not recognizing it.

I have a local fix I’m polishing for working around these issues. Not an ideal solution, though.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

github_iconTop Results From Across the Web

cupy/community - Gitter
Failed to import CuPy. If you installed CuPy via wheels (cupy-cudaXXX or cupy-rocm-X-X), make sure that the package matches with the version of...
Read more >
Using CuPy on AMD GPU (experimental)
Run rocminfo and use the value displayed in Name: line (e.g., gfx900 ). You may also need to set ROCM_HOME (e.g., ROCM_HOME=/opt/rocm )....
Read more >
Emissions | Department of Revenue - Motor Vehicle
My vehicle failed the inspection. What can I do? You could possibly obtain an emissions waiver. What is a waiver? A waiver is...
Read more >
Young Spain squad and its 'tiki-taka' stumble at World Cup
Spain lost to Morocco 3-0 in a penalty shootout in the round of 16 at the World Cup on Tuesday, failing to make...
Read more >
Amazon.com: EISCO Rock Cycle Kit, 12 Pieces - 1" Specimens
Buy EISCO Rock Cycle Kit, 12 Pieces - Includes Metamorphic, Igneous & Sedimentary Rocks - 1" Specimens - Fun Geology Activity for Exploring...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found