`cupy.dot` test failure on ROCm
See original GitHub issueRecently tests/cupy_tests/linalg_tests/test_product.py
(TestDot) is failing for some reason. I tested but running this script alone did not reproduce the issue. Both observed in Jenkins and cupy-rocm-ci-report.
E AssertionError: Only cupy raises error
E
E Traceback (most recent call last):
E File "/global/scratch/kmaeh/cupy-rocm-ci-work/tmp.oE8iI9EKKE/cupy/cupy/cuda/compiler.py", line 650, in compile
E nvrtc.compileProgram(self.ptr, options)
E File "cupy_backends/cuda/libs/nvrtc.pyx", line 136, in cupy_backends.cuda.libs.nvrtc.compileProgram
E cpdef compileProgram(intptr_t prog, options):
E File "cupy_backends/cuda/libs/nvrtc.pyx", line 148, in cupy_backends.cuda.libs.nvrtc.compileProgram
E check_status(status)
E File "cupy_backends/cuda/libs/nvrtc.pyx", line 67, in cupy_backends.cuda.libs.nvrtc.check_status
E raise NVRTCError(status)
E cupy_backends.cuda.libs.nvrtc.NVRTCError: HIPRTC_ERROR_COMPILATION (6)
E
E During handling of the above exception, another exception occurred:
E
E Traceback (most recent call last):
E File "/global/scratch/kmaeh/cupy-rocm-ci-work/tmp.oE8iI9EKKE/cupy/cupy/testing/_helper.py", line 47, in _call_func
E result = impl(*args, **kw)
E File "/global/scratch/kmaeh/cupy-rocm-ci-work/tmp.oE8iI9EKKE/cupy/tests/cupy_tests/linalg_tests/test_product.py", line 49, in test_dot
E return xp.dot(a, b)
E File "/global/scratch/kmaeh/cupy-rocm-ci-work/tmp.oE8iI9EKKE/cupy/cupy/linalg/_product.py", line 65, in dot
E return a.dot(b, out)
E File "cupy/_core/core.pyx", line 1607, in cupy._core.core.ndarray.dot
E return _linalg.dot(self, b, out)
E File "cupy/_core/_routines_linalg.pyx", line 427, in cupy._core._routines_linalg.dot
E return tensordot_core(a, b, out, n, m, k, ret_shape)
E File "cupy/_core/_routines_linalg.pyx", line 501, in cupy._core._routines_linalg.tensordot_core
E return _integral_tensordot_core(b, a, out, m, n, k, dtype, ret_shape)
E File "cupy/_core/_routines_linalg.pyx", line 327, in cupy._core._routines_linalg._integral_tensordot_core
E kern = _tensordot_core_int_kernel(config, dtype)
E File "cupy/_util.pyx", line 59, in cupy._util.memoize.decorator.ret
E result = f(*args, **kwargs)
E File "cupy/_core/_routines_linalg.pyx", line 302, in cupy._core._routines_linalg._tensordot_core_int_kernel
E ker = mod.get_function(
E File "cupy/_core/raw.pyx", line 470, in cupy._core.raw.RawModule.get_function
E mangled_name = self.module.mapping.get(name)
E File "cupy/_core/raw.pyx", line 394, in cupy._core.raw.RawModule.module.__get__
E return self._module()
E File "cupy/_core/raw.pyx", line 402, in cupy._core.raw.RawModule._module
E mod = _get_raw_module(
E File "cupy/_util.pyx", line 59, in cupy._util.memoize.decorator.ret
E result = f(*args, **kwargs)
E File "cupy/_core/raw.pyx", line 547, in cupy._core.raw._get_raw_module
E mod = cupy._core.core.compile_with_cache(
E File "cupy/_core/core.pyx", line 2015, in cupy._core.core.compile_with_cache
E cpdef function.Module compile_with_cache(
E File "cupy/_core/core.pyx", line 2078, in cupy._core.core.compile_with_cache
E return cuda.compile_with_cache(
E File "/global/scratch/kmaeh/cupy-rocm-ci-work/tmp.oE8iI9EKKE/cupy/cupy/cuda/compiler.py", line 453, in compile_with_cache
E return _compile_with_cache_hip(
E File "/global/scratch/kmaeh/cupy-rocm-ci-work/tmp.oE8iI9EKKE/cupy/cupy/cuda/compiler.py", line 853, in _compile_with_cache_hip
E binary, mapping = compile_using_nvrtc(
E File "/global/scratch/kmaeh/cupy-rocm-ci-work/tmp.oE8iI9EKKE/cupy/cupy/cuda/compiler.py", line 295, in compile_using_nvrtc
E return _compile(source, options, cu_path,
E File "/global/scratch/kmaeh/cupy-rocm-ci-work/tmp.oE8iI9EKKE/cupy/cupy/cuda/compiler.py", line 279, in _compile
E compiled_obj, mapping = prog.compile(options, log_stream)
E File "/global/scratch/kmaeh/cupy-rocm-ci-work/tmp.oE8iI9EKKE/cupy/cupy/cuda/compiler.py", line 667, in compile
E raise CompileException(log, self.src, self.name, options,
E cupy.cuda.compiler.CompileException: /tmp/comgr-917c3a/input/CompileSource:203:27: error: redeclaration of '__hiprtc_16' with a different type: 'void (*const)(int, int, int, const signed char *, const signed char *, signed char *)' vs 'void (*const)(int, int, int, const bool *, const bool *, bool *)'
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<signed char>;
E ^
E /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E ^
E /tmp/comgr-917c3a/input/CompileSource:204:27: error: redefinition of '__hiprtc_16'
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<unsigned char>;
E ^
E /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E ^
E /tmp/comgr-917c3a/input/CompileSource:205:27: error: redefinition of '__hiprtc_16'
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<short>;
E ^
E /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E ^
E /tmp/comgr-917c3a/input/CompileSource:206:27: error: redefinition of '__hiprtc_16'
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<unsigned short>;
E ^
E /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E ^
E /tmp/comgr-917c3a/input/CompileSource:207:27: error: redefinition of '__hiprtc_16'
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<int>;
E ^
E /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E ^
E /tmp/comgr-917c3a/input/CompileSource:208:27: error: redefinition of '__hiprtc_16'
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<unsigned int>;
E ^
E /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E ^
E /tmp/comgr-917c3a/input/CompileSource:209:27: error: redefinition of '__hiprtc_16'
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<long>;
E ^
E /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E ^
E /tmp/comgr-917c3a/input/CompileSource:210:27: error: redefinition of '__hiprtc_16'
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<unsigned long>;
E ^
E /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E ^
E /tmp/comgr-917c3a/input/CompileSource:211:27: error: redefinition of '__hiprtc_16'
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<long long>;
E ^
E /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E ^
E /tmp/comgr-917c3a/input/CompileSource:212:27: error: redefinition of '__hiprtc_16'
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<unsigned long long>;
E ^
E /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E ^
E 10 errors generated when compiling for gfx908.
E Error: Failed to compile opencl source (from CL or HIP source to LLVM IR).
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:8 (7 by maintainers)
Top Results From Across the Web
Fix remaining ROCm test failures · Issue #6245 · cupy/cupy · GitHub
Currently, several tests are failing with ROCm 4.3 in the master branch. We need to investigate and fix them.
Read more >latest PDF - CuPy Documentation
CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. CuPy acts as a drop-in replacement to run existing NumPy/SciPy ...
Read more >HIP Porting Guide — ROCm 4.5.0 documentation
/usr/include/c++/v1/memory:5172:15: error: call to implicitly deleted default constructor ... This is useful for testing improvements to the hipify toolset.
Read more >Obtain A Driver's License - Bureau of Motor Vehicles
Every driver involved in a fatal motor vehicle accident or an accident where a death is likely to occur must submit to a...
Read more >Motor Vehicle & Driver License (Umbrella Page)
If the person applying for a Driver License has previously failed a driving test, they must complete this transaction at a Full-Service Driver...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Please reference internal ticket: SWDEV-308603
I got the cause:
cupy_tests/array_api_tests/test_array_object.py
callsdot
(maybe via__mul__
)tests/cupy_tests/core_tests/test_raw.py
clears CuPy’s memocupy_tests/linalg_tests/test_product.py
callsdot
again, triggering hiprtc again as the memo was cleared in 2., eventually causing the compile error