Cudasim acting differently than Cuda (when allocating)
See original GitHub issueHello!
I had a bug which i finally found but when debugging i found that cudasim was acting differently (had no bug) than cuda (had bug): It was like:
@cuda.jit(device=True)
def f(n):
a = cuda.local.array(n, int32)
for i in range(n): a[i] = i
return a
@cuda.jit
def kernel(in, out):
a = f(n)
# doing something with a
With cuda, “a” seems to be discarded, but with cudasim it has its values which made debugging quite hard. Maybe in cudasim mode a could also be discarded?
I have another question: How do you allocate global memory in a cuda kernel? I only found giving the function an array which is allocated on CPU site, but I cant find a pendant to cuda.local.array?
And another suggestion: In cudasim you can use print(), in cuda it throws an error. I think it would be convenient to simply ignore print() in pure cuda mode, because then there is no need to comment the print statements when switching cuda and cudasim mode.
Thanks!
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (5 by maintainers)
Here the output:
Just checked this with Numba 0.54 RC:
I also noticed that this is returning a local array from a function, which isn’t expected to work (see also discussion in #7090), so I’m going to close this.