"access violation" problem
See original GitHub issueReporting a bug
- I am using the latest released version of Numba
- I have included below a minimal working reproducer
Thank you all for developing numba! I’m now using numba 0.49.1 with cuda 10.2.89 on python 3.7.5, windows 10. I have a problem in my code that it… sometimes fail to execute due to OSError: exception: access violation reading 0x<non-zero-non-0xFF...address>
.
cuda-memcheck.exe
shows nothing.
The memory access violation occurs every time in the numba\cuda\cudadrv\nvvm.py in compile(self, **options), line 232, err = self.driver.nvvmCompileProgram(self._handle, len(opts), c_opts)
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
in
39 arr2 = np.arange(16,dtype=np.int8)
40 arr5 = np.zeros(16*16,dtype=np.int8)
---> 41 foo[2, 16](arr1,arr2,arr5) # more threads than array elements
V:\Program Files\Python\lib\site-packages\numba\cuda\compiler.py in __call__(self, *args)
756 Specialize and invoke this kernel with *args*.
757 '''
--> 758 kernel = self.specialize(*args)
759 cfg = kernel[self.griddim, self.blockdim, self.stream, self.sharedmem]
760 cfg(*args)
V:\Program Files\Python\lib\site-packages\numba\cuda\compiler.py in specialize(self, *args)
767 argtypes = tuple(
768 [self.typingctx.resolve_argument_type(a) for a in args])
--> 769 kernel = self.compile(argtypes)
770 return kernel
771
V:\Program Files\Python\lib\site-packages\numba\cuda\compiler.py in compile(self, sig)
786 self.definitions[(cc, argtypes)] = kernel
787 if self.bind:
--> 788 kernel.bind()
789 return kernel
790
V:\Program Files\Python\lib\site-packages\numba\cuda\compiler.py in bind(self)
528 Force binding to current CUDA context
529 """
--> 530 self._func.get()
531
532 @property
V:\Program Files\Python\lib\site-packages\numba\cuda\compiler.py in get(self)
405 cufunc = self.cache.get(device.id)
406 if cufunc is None:
--> 407 ptx = self.ptx.get()
408
409 # Link
V:\Program Files\Python\lib\site-packages\numba\cuda\compiler.py in get(self)
376 arch = nvvm.get_arch_option(*cc)
377 ptx = nvvm.llvm_to_ptx(self.llvmir, opt=3, arch=arch,
--> 378 **self._extra_options)
379 self.cache[cc] = ptx
380 if config.DUMP_ASSEMBLY:
V:\Program Files\Python\lib\site-packages\numba\cuda\cudadrv\nvvm.py in llvm_to_ptx(llvmir, **opts)
493 cu.add_module(libdevice.get())
494
--> 495 ptx = cu.compile(**opts)
496 # XXX remove debug_pubnames seems to be necessary sometimes
497 return patch_ptx_debug_pubnames(ptx)
V:\Program Files\Python\lib\site-packages\numba\cuda\cudadrv\nvvm.py in compile(self, **options)
229 c_opts = (c_char_p * len(opts))(*[c_char_p(x.encode('utf8'))
230 for x in opts])
--> 231 err = self.driver.nvvmCompileProgram(self._handle, len(opts), c_opts)
232 self._try_error(err, 'Failed to compile\n')
233
OSError: exception: access violation reading 0x000002087936AB58
If run the program again, it shows CudaAPIError: [500] Call to cuModuleGetFunction results in CUDA_ERROR_NOT_FOUND
.
Here’s a reproducer works on my machine:
import numpy as np
import numba
from numba import cuda
sbox = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],dtype=np.int8)
pbox = np.array([15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0],dtype=np.int8)
@cuda.jit(device=True)
def bar(txt,el,r):
cudas_box=cuda.const.array_like(sbox)
cudap_box=cuda.const.array_like(pbox)
if r!=0:
temp=cuda.local.array(16,numba.int8)
#error do not occur if either for loop is removed
for i in range(16):
temp[i]=cudas_box[txt[cudap_box[i]]]
for i in range(16):
txt[i]=temp[i]
return txt[10]+el
@cuda.jit(debug=True)
def foo(arr1,arr2,arr5):
tx = cuda.threadIdx.x
ty = cuda.blockIdx.x
bw = cuda.blockDim.x
pos = tx + ty * bw
pos1=pos%16
pos2=(pos//16)%16
if pos<len(arr5):
roundtext=cuda.local.array(16,numba.int8)
for i in range(16):
roundtext[i]=arr1[pos1][i]
arr5[pos] = bar(roundtext,arr2[pos2],pos%2)
arr1 = np.arange(16,dtype=np.int8).repeat(16).reshape(16,16)
arr2 = np.arange(16,dtype=np.int8)
arr5 = np.zeros(16*16,dtype=np.int8)
foo[2, 16](arr1,arr2,arr5)
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (7 by maintainers)
Top Results From Across the Web
Best solutions to the "Access Violation at Address" error on ...
1. Perform a Malware Scan. 2. Add Data Execution Prevention Exception. 3. Disable User Account Control.
Read more >You receive an access violation error and the system may ...
You receive an access violation error in the Svchost.exe process. This process is related to the Automatic Updates service. This access violation occurs...
Read more >What should I do when I see an Access Violation? - Datacolor
An Access Violation is an unhandled error. The error means some program code tried to act on memory to which it has no...
Read more >Exception Access Violation: What is it & How to Fix
Access Violation is a memory related error. It usually arises while installing or running an application if the memory allocation for the ...
Read more >8 Solutions to the "Access Violation at Address" Error on ...
1. Inspect Your Computer for a Malware Infection. · 2. Add Your Program to an Exception Access List. · 3. Consider Disabling User...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks for linking to the answer. This issue was caused by some transformations that were done to the IR that made it invalid when passing it to NVVM. However, for errors in user code using the simulator can also be a good option. If the simulator is too slow, then one can also add
debug=True,opt=False
kwargs to the@cuda.jit
decorator and run undercompute-sanitizer
(e.g.compute-sanitizer python bug.py
- if there are any source lines with invalid accesses these will be reported. It’s not as interactive / deep a tool for CUDA Python code as the CUDA simulator, but it is a lot faster.Thanks for the report and nice reproducer! I can reproduce this.