question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

jit.rawkernel decorated function cannot be used on anothor device

See original GitHub issue

When I trying to make the kernel function decorated by jit.rawkernel() to work on another device, it shows `“CUDA_ERROR_INVALID_HANDLE: invalid resource handle”, though it can be successfully executed on the first device.

The following code can demonstrate it more clearly…

  • Code to reproduce
import cupy as cp
import cupyx as cpx
from torch.utils.dlpack import to_dlpack
from torch.utils.dlpack import from_dlpack
import numpy as np
import torch

@cpx.jit.rawkernel()
def add_one_kernel(var0, out, m, n):
    tid = cpx.jit.threadIdx.x + cpx.jit.blockIdx.x * cpx.jit.blockDim.x
    if tid < n:
        for i in range(m):
            out[i, tid] = var0[i, tid] + 1

def AddOne_cupy(var0, window, valid):
    var0 = cp.fromDlpack(to_dlpack(var0))
    with cp.cuda.Device(var0.device.id):
        out = cp.zeros_like(var0)
        add_one_kernel[64, 64](var0, out, var0.shape[0], var0.shape[1])
        out = from_dlpack(out.toDlpack())
        out[:window, :] = np.nan
        out[~valid] = np.nan
    print(f"success on device {var0.device.id}")
    return out

cp.show_config()
### speed test
var0 = torch.randn(2500, 4000).cuda()
valid = torch.ones(var0.shape).bool().cuda()
window=20
AddOne_cupy(var0, window, valid)

var1 = torch.randn(2500, 4000).cuda(1)
valid1 = torch.ones(var0.shape).bool().cuda(1)
AddOne_cupy(var1, window, valid1)

“success on device 0” is printed! but fail to execute on the second device

  • Error messages, stack traces, or logs
Traceback (most recent call last):
  File "test2.py", line 38, in <module>
    AddOne_cupy(var1, window, valid1, stream1)
  File "test2.py", line 19, in AddOne_cupy
    add_one_kernel[64, 64](var0, out, var0.shape[0], var0.shape[1])
  File "/opt/anaconda3/lib/python3.7/site-packages/cupyx/jit/_interface.py", line 123, in <lambda>
    return lambda *args, **kwargs: self(grid, block, args, **kwargs)
  File "/opt/anaconda3/lib/python3.7/site-packages/cupyx/jit/_interface.py", line 111, in __call__
    kern(grid, block, args, shared_mem, stream, enable_cooperative_groups)
  File "cupy/cuda/function.pyx", line 201, in cupy.cuda.function.Function.__call__
  File "cupy/cuda/function.pyx", line 183, in cupy.cuda.function._launch
  File "cupy_backends/cuda/api/driver.pyx", line 306, in cupy_backends.cuda.api.driver.launchKernel
  File "cupy_backends/cuda/api/driver.pyx", line 125, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_INVALID_HANDLE: invalid resource handle

  • Conditions (you can just paste the output of python -c 'import cupy; cupy.show_config()')
    • CuPy version: 9.4.0
    • OS/Platform: Linux-4.18.0-147.8.1.el8_1.x86_64-x86_64-with-centos-8.1.1911-Core
    • CUDA version: 11010
    • cuDNN/NCCL version: 8005

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
asi1024commented, Mar 23, 2022

Sorry for my late response. #6575 will fix this issue!

1reaction
yixunzhangcommented, Sep 28, 2021

Thanks for the report! Seems we should cache compiled kernel per device: https://github.com/cupy/cupy/blob/8d9e5aebe8d20faf1809ed11a0d7852be6fdfba6/cupyx/jit/_interface.py#L108-L109

Is your GPU model different between the first one and the second one?

The printed cache code is as follow:

{(<cupyx.jit._cuda_types.CArray object at 0x7fa5c07287d0>, <cupyx.jit._cuda_types.CArray object at 0x7fa5c0728910>, <cupyx.jit._cuda_types.Scalar object at 0x7fa5c0728950>, <cupyx.jit._cuda_types.Scalar object at 0x7fa5c07288d0>): '\nextern "C" __global__ void add_one_kernel(CArray<float, 2, true, true> var0, CArray<float, 2, true, true> out, int m, int n) {\n  unsigned int tid;\n  int i;\n  tid = (threadIdx.x + (blockIdx.x * blockDim.x));\n  if ((tid < (unsigned int)(n))) {\n    for (int __it = 0, __stop = m, __step = 1; __step >= 0 ? __it < __stop : __it > __stop; __it += __step) {\n      i = __it;\n      out._indexing(thrust::make_tuple(i, tid)) = (var0._indexing(thrust::make_tuple(i, tid)) + 1.0f);\n    }\n  }\n  else {\n  }\n}'}

The device info: image

Read more comments on GitHub >

github_iconTop Results From Across the Web

Defining __device__ kernels using cupy · Issue #1776 - GitHub
I think writing struct cuComplex and __device__ int julia( int x, int y ) definitions in the same RawKernel with __global__ function was...
Read more >
Device function throws nopython exception when its returning ...
The source of your error is that the device function sub_stuff is attempting to create a list in GPU code, and that isn't...
Read more >
User-Defined Kernels — CuPy 11.4.0 documentation
jit.rawkernel decorator can create raw CUDA kernels from Python functions. In this section, a Python function wrapped with the decorator is called a...
Read more >
CuPy Documentation - Read the Docs
cupy.fuse() is a decorator that fuses functions. ... The cupyx.jit.rawkernel decorator can create raw CUDA kernels from Python functions.
Read more >
Writing CUDA-Python — numba 0.13.0 documentation
CUDA kernels and device functions are compiled by decorating a Python function with the jit or autojit decorators. numba.cuda.jit(restype=None, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found