ROCm RawModule template kernel with complex
See original GitHub issue-
Conditions (you can just paste the output of
python -c 'import cupy; cupy.show_config()'
)- CuPy version 9.1.0
- OS/Platform AMD ROCm
-
Code to reproduce
import cupy
code = r'''
#include <cupy/complex.cuh>
template<typename T>
__global__ void func(T* in_arr) { /* do something */ }
'''
kers = ('func<int>', 'func<complex<double>>')
mod = cupy.RawModule(code=code, options=('--std=c++11',),
name_expressions=kers, translate_cucomplex=False)
ker_int = mod.get_function(kers[1])
- Error messages, stack traces, or logs
When calling RawModule.get_function()
using kernels with complex template names
Traceback (most recent call last):
File "test.py", line 15, in <module>
ker_int = mod.get_function(kers[1])
File "cupy/_core/raw.pyx", line 485, in cupy._core.raw.RawModule.get_function
File "cupy/_core/raw.pyx", line 96, in cupy._core.raw.RawKernel.kernel.__get__
File "cupy/_core/raw.pyx", line 117, in cupy._core.raw.RawKernel._kernel
File "cupy/cuda/function.pyx", line 253, in cupy.cuda.function.Module.get_function
File "cupy/cuda/function.pyx", line 194, in cupy.cuda.function.Function.__init__
File "cupy_backends/cuda/api/driver.pyx", line 269, in cupy_backends.cuda.api.driver.moduleGetFunction
File "cupy_backends/cuda/api/driver.pyx", line 125, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: hipErrorNotFound: hipErrorNotFound
However if I do not use templates, the kernel runs fine. Is that feature not supported on ROCm?
Issue Analytics
- State:
- Created 2 years ago
- Comments:20 (15 by maintainers)
Top Results From Across the Web
Kernel Language — ROCm 4.5.0 documentation
Introduction¶. HIP provides a C++ syntax that is suitable for compiling most code that commonly appears in compute kernels, including classes, namespaces, ...
Read more >hiprtcCompileProgram can not recognise the -I option #2182
The test was conducted using the rocm/dev-ubuntu-16.04:3.5 docker image from ... ROCm RawModule template kernel with complex cupy/cupy#5436.
Read more >CuPy Documentation - Read the Docs
CuPy has an experimental support for AMD GPU (ROCm). ... To support C++ template kernels, RawModule additionally provide a name_expressions ...
Read more >cupy.ndarray
Raw kernels operating on complex-valued arrays can be created as well: ... as of CUDA Toolkit 10.1, see the introduction to RawModule below....
Read more >introduction to amd gpu programming with hip
Overview of GPU Kernels ... Open Compute (ROCm). No kernel drivers involved. ... C++ Template Library for. Linear Algebra.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Internal ticket: https://ontrack-internal.amd.com/browse/SWDEV-294764
@yxsamliu is the expert here, but this is my understanding (it may be wrong).
In https://docs.nvidia.com/cuda/nvrtc/index.html#accessing-lowered-names, it is mentioned “NVRTC will parse the name expression string as a C++ constant expression at the end of the user program”. If the name expression string is parsed in the context of the program scope, it should get the
using
directive as you mentioned. But if the name expression string is parsed outside the context of the program scope, you will have to use fully qualified names.This might be the source of the confusion. Since the NV documentation does not make it clear, it might be better to be conservative and support the usage that provides context-free naming and clarity.