No implementation of CUDA shared.array error
See original GitHub issueReporting a bug
- I have tried using the latest released version of Numba (most recent is visible in the change log (https://github.com/numba/numba/blob/main/CHANGE_LOG).
- I have included a self contained code sample to reproduce the problem. i.e. it’s possible to run as ‘python bug.py’.
Problem description
When trying to initialize a 7 dimensional shared array, an error is thrown by the Numba compiler.
MWE
import numba
import torch
import math
from numba import cuda
@cuda.jit
def op_numba_c(input):
sharedI = cuda.shared.array(shape=(cuda.blockDim.x, numba.int32(input.shape[1]), numba.int32(input.shape[2]), numba.int32(input.shape[3]), numba.int32(input.shape[4]), numba.int32(input.shape[5]), numba.int32(input.shape[6])), dtype=numba.float32)
if __name__ == '__main__':
device = torch.device('cuda:0')
input = torch.rand(2, 3, 7, 7, 3, 7, 7).to(device)
def t2nb(ten):
return numba.cuda.as_cuda_array(ten)
# TODO fidling with threads
threadsperblock= (16, 16)
blockspergrid_x = math.ceil(input.shape[0] / threadsperblock[0])
blockspergrid_y = math.ceil(input.shape[1] / threadsperblock[1])
blockspergrid = (blockspergrid_x, blockspergrid_y)
i_2 = input.clone()
op_numba_c[blockspergrid, threadsperblock](t2nb(i_2))
Std out
Traceback removed for clarity
numba.core.errors.TypingError: Failed in cuda mode pipeline (step: nopython frontend)
No implementation of function Function(<function shared.array at 0x7fd3668320d0>) found for signature:
>>> array(shape=UniTuple(int32 x 7), dtype=class(float32))
There are 2 candidate implementations:
- Of which 2 did not match due to:
Overload of function 'array': File: numba/cuda/cudadecl.py: Line 46.
With argument(s): '(shape=UniTuple(int32 x 7), dtype=class(float32))':
No match.
During: resolving callee type: Function(<function shared.array at 0x7fd3668320d0>)
During: typing of call at /home/sebastien/workspace/MemSE/MemSE/nn/op/numba_test.py (9)
File "MemSE/nn/op/numba_test.py", line 9:
def op_numba_c(input):
sharedI = cuda.shared.array(shape=(cuda.blockDim.x, numba.int32(input.shape[1]), numba.int32(input.shape[2]), numba.int32(input.shape[3]), numba.int32(input.shape[4]), numba.int32(input.shape[5]), numba.int32(input.shape[6])), dtype=numba.float32)
Let me know if I can provide further detail !
Issue Analytics
- State:
- Created a year ago
- Comments:6 (1 by maintainers)
Top Results From Across the Web
Cannot create a shared array in a kernel ... - Numba Discussion
The above code doesn't compile, I'm receiving this error message: ... No implementation of function Function(<function shared.array at ...
Read more >Numba - Shared memory in CUDA kernel not updating correctly
I'm confused because cuda.shared.array is shared by all threads in a given block, right? How do I accumulate the increments using the same...
Read more >cuda.local.array produces TypingError even though the shape ...
I've tried defining the variable in the top-level function mat_concatenate() , but get the same error: TypingError: No implementation of ...
Read more >Memory management — Numba 0.50.1 documentation
Create a DeviceNDArray from any object that implements the cuda array interface. A view of the underlying GPU buffer is created. No copying...
Read more >CUDA Kernel API - Numba documentation
sharedmem – The number of bytes of dynamic shared memory required by the kernel. ... Creates an array in the local memory space...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Dynamic shared memory is needed here - however, for a multi-dimensional array we need to implement reshape, which is tracked by issue #7528. I would like to get that resolved by the next release of Numba.
A couple of other points on the source:
as_cuda_array()
on a Torch tensor - you can pass Torch tensors directly to Numba kernels (Numba internally with allas_cuda_array()
on it anyway)It might be worth posting a bit about what you’re trying to implement and asking for suggestions on https://numba.discourse.group if you have more questions about what to do here - I think using shared memory may not be a good fit for your actual use case.
Thanks for the input Graham. I’ll close this issue in favor of the discourse topic.