question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`cupy.dot` test failure on ROCm

See original GitHub issue

Recently tests/cupy_tests/linalg_tests/test_product.py (TestDot) is failing for some reason. I tested but running this script alone did not reproduce the issue. Both observed in Jenkins and cupy-rocm-ci-report.

https://raw.githubusercontent.com/kmaehashi/cupy-rocm-ci-report/d0c2d860610c69db37b99b0ff18da33a22c11a69/docs/master/output_test.log

E           AssertionError: Only cupy raises error
E           
E           Traceback (most recent call last):
E             File "/global/scratch/kmaeh/cupy-rocm-ci-work/tmp.oE8iI9EKKE/cupy/cupy/cuda/compiler.py", line 650, in compile
E               nvrtc.compileProgram(self.ptr, options)
E             File "cupy_backends/cuda/libs/nvrtc.pyx", line 136, in cupy_backends.cuda.libs.nvrtc.compileProgram
E               cpdef compileProgram(intptr_t prog, options):
E             File "cupy_backends/cuda/libs/nvrtc.pyx", line 148, in cupy_backends.cuda.libs.nvrtc.compileProgram
E               check_status(status)
E             File "cupy_backends/cuda/libs/nvrtc.pyx", line 67, in cupy_backends.cuda.libs.nvrtc.check_status
E               raise NVRTCError(status)
E           cupy_backends.cuda.libs.nvrtc.NVRTCError: HIPRTC_ERROR_COMPILATION (6)
E           
E           During handling of the above exception, another exception occurred:
E           
E           Traceback (most recent call last):
E             File "/global/scratch/kmaeh/cupy-rocm-ci-work/tmp.oE8iI9EKKE/cupy/cupy/testing/_helper.py", line 47, in _call_func
E               result = impl(*args, **kw)
E             File "/global/scratch/kmaeh/cupy-rocm-ci-work/tmp.oE8iI9EKKE/cupy/tests/cupy_tests/linalg_tests/test_product.py", line 49, in test_dot
E               return xp.dot(a, b)
E             File "/global/scratch/kmaeh/cupy-rocm-ci-work/tmp.oE8iI9EKKE/cupy/cupy/linalg/_product.py", line 65, in dot
E               return a.dot(b, out)
E             File "cupy/_core/core.pyx", line 1607, in cupy._core.core.ndarray.dot
E               return _linalg.dot(self, b, out)
E             File "cupy/_core/_routines_linalg.pyx", line 427, in cupy._core._routines_linalg.dot
E               return tensordot_core(a, b, out, n, m, k, ret_shape)
E             File "cupy/_core/_routines_linalg.pyx", line 501, in cupy._core._routines_linalg.tensordot_core
E               return _integral_tensordot_core(b, a, out, m, n, k, dtype, ret_shape)
E             File "cupy/_core/_routines_linalg.pyx", line 327, in cupy._core._routines_linalg._integral_tensordot_core
E               kern = _tensordot_core_int_kernel(config, dtype)
E             File "cupy/_util.pyx", line 59, in cupy._util.memoize.decorator.ret
E               result = f(*args, **kwargs)
E             File "cupy/_core/_routines_linalg.pyx", line 302, in cupy._core._routines_linalg._tensordot_core_int_kernel
E               ker = mod.get_function(
E             File "cupy/_core/raw.pyx", line 470, in cupy._core.raw.RawModule.get_function
E               mangled_name = self.module.mapping.get(name)
E             File "cupy/_core/raw.pyx", line 394, in cupy._core.raw.RawModule.module.__get__
E               return self._module()
E             File "cupy/_core/raw.pyx", line 402, in cupy._core.raw.RawModule._module
E               mod = _get_raw_module(
E             File "cupy/_util.pyx", line 59, in cupy._util.memoize.decorator.ret
E               result = f(*args, **kwargs)
E             File "cupy/_core/raw.pyx", line 547, in cupy._core.raw._get_raw_module
E               mod = cupy._core.core.compile_with_cache(
E             File "cupy/_core/core.pyx", line 2015, in cupy._core.core.compile_with_cache
E               cpdef function.Module compile_with_cache(
E             File "cupy/_core/core.pyx", line 2078, in cupy._core.core.compile_with_cache
E               return cuda.compile_with_cache(
E             File "/global/scratch/kmaeh/cupy-rocm-ci-work/tmp.oE8iI9EKKE/cupy/cupy/cuda/compiler.py", line 453, in compile_with_cache
E               return _compile_with_cache_hip(
E             File "/global/scratch/kmaeh/cupy-rocm-ci-work/tmp.oE8iI9EKKE/cupy/cupy/cuda/compiler.py", line 853, in _compile_with_cache_hip
E               binary, mapping = compile_using_nvrtc(
E             File "/global/scratch/kmaeh/cupy-rocm-ci-work/tmp.oE8iI9EKKE/cupy/cupy/cuda/compiler.py", line 295, in compile_using_nvrtc
E               return _compile(source, options, cu_path,
E             File "/global/scratch/kmaeh/cupy-rocm-ci-work/tmp.oE8iI9EKKE/cupy/cupy/cuda/compiler.py", line 279, in _compile
E               compiled_obj, mapping = prog.compile(options, log_stream)
E             File "/global/scratch/kmaeh/cupy-rocm-ci-work/tmp.oE8iI9EKKE/cupy/cupy/cuda/compiler.py", line 667, in compile
E               raise CompileException(log, self.src, self.name, options,
E           cupy.cuda.compiler.CompileException: /tmp/comgr-917c3a/input/CompileSource:203:27: error: redeclaration of '__hiprtc_16' with a different type: 'void (*const)(int, int, int, const signed char *, const signed char *, signed char *)' vs 'void (*const)(int, int, int, const bool *, const bool *, bool *)'
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<signed char>;
E                                     ^
E           /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E                                     ^
E           /tmp/comgr-917c3a/input/CompileSource:204:27: error: redefinition of '__hiprtc_16'
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<unsigned char>;
E                                     ^
E           /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E                                     ^
E           /tmp/comgr-917c3a/input/CompileSource:205:27: error: redefinition of '__hiprtc_16'
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<short>;
E                                     ^
E           /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E                                     ^
E           /tmp/comgr-917c3a/input/CompileSource:206:27: error: redefinition of '__hiprtc_16'
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<unsigned short>;
E                                     ^
E           /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E                                     ^
E           /tmp/comgr-917c3a/input/CompileSource:207:27: error: redefinition of '__hiprtc_16'
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<int>;
E                                     ^
E           /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E                                     ^
E           /tmp/comgr-917c3a/input/CompileSource:208:27: error: redefinition of '__hiprtc_16'
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<unsigned int>;
E                                     ^
E           /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E                                     ^
E           /tmp/comgr-917c3a/input/CompileSource:209:27: error: redefinition of '__hiprtc_16'
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<long>;
E                                     ^
E           /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E                                     ^
E           /tmp/comgr-917c3a/input/CompileSource:210:27: error: redefinition of '__hiprtc_16'
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<unsigned long>;
E                                     ^
E           /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E                                     ^
E           /tmp/comgr-917c3a/input/CompileSource:211:27: error: redefinition of '__hiprtc_16'
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<long long>;
E                                     ^
E           /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E                                     ^
E           /tmp/comgr-917c3a/input/CompileSource:212:27: error: redefinition of '__hiprtc_16'
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<unsigned long long>;
E                                     ^
E           /tmp/comgr-917c3a/input/CompileSource:202:27: note: previous definition is here
E           extern "C" constexpr auto __hiprtc_16 = _tensordot_core_int_kernel<bool>;
E                                     ^
E           10 errors generated when compiling for gfx908.
E           Error: Failed to compile opencl source (from CL or HIP source to LLVM IR).

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:2
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
amathews-amdcommented, Oct 25, 2021

Please reference internal ticket: SWDEV-308603

2reactions
takagicommented, Oct 19, 2021

I got the cause:

  1. cupy_tests/array_api_tests/test_array_object.py calls dot (maybe via __mul__)
  2. A test in tests/cupy_tests/core_tests/test_raw.py clears CuPy’s memo
  3. Then, cupy_tests/linalg_tests/test_product.py calls dot again, triggering hiprtc again as the memo was cleared in 2., eventually causing the compile error
Read more comments on GitHub >

github_iconTop Results From Across the Web

Fix remaining ROCm test failures · Issue #6245 · cupy/cupy · GitHub
Currently, several tests are failing with ROCm 4.3 in the master branch. We need to investigate and fix them.
Read more >
latest PDF - CuPy Documentation
CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. CuPy acts as a drop-in replacement to run existing NumPy/SciPy ...
Read more >
HIP Porting Guide — ROCm 4.5.0 documentation
/usr/include/c++/v1/memory:5172:15: error: call to implicitly deleted default constructor ... This is useful for testing improvements to the hipify toolset.
Read more >
Obtain A Driver's License - Bureau of Motor Vehicles
Every driver involved in a fatal motor vehicle accident or an accident where a death is likely to occur must submit to a...
Read more >
Motor Vehicle & Driver License (Umbrella Page)
If the person applying for a Driver License has previously failed a driving test, they must complete this transaction at a Full-Service Driver...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found