Compilation with cuda.jit randomly fails with segfault
See original GitHub issue- I have tried using the latest released version of Numba (most recent is visible in the change log (https://github.com/numba/numba/blob/master/CHANGE_LOG).
- I have included below a minimal working reproducer (if you are unsure how to write one see http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports).
I’m porting a code over to python and using numba with cuda. I’m getting random segfaults that appear to be during the cuda compilation. It usually runs, but maybe one out of 5 times it segfaults.
This is an example that (sometimes) reproduces the problem.
import numpy as np
import math
from numba import cuda, float32
NUM_SAMPLES = 2
NUM_THREADS = 128
@cuda.jit
def computeValues(data_input, angle, total_size):
threadID = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x
if threadID >= total_size:
return
s1 = cuda.local.array(shape=(NUM_SAMPLES,4), dtype=float32)
for i in range(NUM_SAMPLES):
for j in range(4):
s1[i,j] = data_input[threadID,i,j]
s1[i,2] = math.sin(angle) * s1[i,1] + math.cos(angle) * s1[i,2]
if __name__ == "__main__":
total_size = 200
data_input = np.zeros((total_size, NUM_SAMPLES, 4), dtype='float32')
BlockSize = int(math.ceil(total_size / NUM_THREADS))
num_gpu = len(cuda.gpus)
print(f"number of CUDA devices: {num_gpu}")
cuda.select_device(2)
for value in [280.0, 285.0, 290.0]:
print(f"value = {value} starting\n")
computeValues[BlockSize, NUM_THREADS](data_input, 0.0, total_size)
Most times it works as expected:
$ PYTHONFAULTHANDLER=1 python3 ../test.py
number of CUDA devices: 4
value = 280.0 starting
value = 285.0 starting
value = 290.0 starting
Other times it fails like this:
$ PYTHONFAULTHANDLER=1 python3 ../test.py
number of CUDA devices: 4
value = 280.0 starting
Fatal Python error: Segmentation fault
Current thread 0x00007f169eaf4740 (most recent call first):
File "/lib/python3.7/site-packages/numba/cuda/cudadrv/nvvm.py", line 230 in compile
File "/lib/python3.7/site-packages/numba/cuda/cudadrv/nvvm.py", line 512 in llvm_to_ptx
File "/lib/python3.7/site-packages/numba/cuda/compiler.py", line 451 in get
File "/lib/python3.7/site-packages/numba/cuda/compiler.py", line 480 in get
File "/lib/python3.7/site-packages/numba/cuda/compiler.py", line 603 in bind
File "/lib/python3.7/site-packages/numba/cuda/compiler.py", line 862 in compile
File "/lib/python3.7/site-packages/numba/cuda/compiler.py", line 843 in specialize
File "/lib/python3.7/site-packages/numba/cuda/compiler.py", line 832 in __call__
File "../test.py", line 38 in <module>
Segmentation fault (core dumped)
It looks like this is happening during the cuda compilation. Is there anything I can change in my code to fix this?
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (7 by maintainers)
Top Results From Across the Web
Troubleshooting and tips - Numba documentation
The most common reason for slowness of a compiled JIT function is that compiling in nopython mode has failed and the Numba compiler...
Read more >Why does my code (linked with CUDA) occasionally cause a ...
This is not actually a solution to your segfault problem, but a way to finding the actual culprit behind the segfault.
Read more >numba/numba - Gitter
I'd also really love to know why fork() is failing on your machine?! ... Hi, I want to use carray in the numba...
Read more >Release Notes - ArrayFire
Fixed launch configuration issues in CUDA JIT. Fixed segfaults and "Pure Virtual Call" error warnings when exiting on Windows.
Read more >CuArrays v2.0.1 results in segmentation fault for ...
The following code with CuArrays v2.0.1 results in segmentation fault on my ... Corporation Built on Fri_Feb__8_19:08:17_PST_2019 Cuda compilation tools, ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
That was really helpful!
Actually the valgrind error might gave been spurious / a red herring. I notice that prior to optimization, there are no
memset
intrinsics in the IR. However, after optimization, we have:The LLVM 3.4 IR specification (upon which NVVM is based) expects 5 parameters to the
memset
instrinsic (ref):Whereas LLVM 9 only has 4 arguments for memset (ref):
My understanding right now is that NVVM is parsing the optimized IR, but its last parameter is junk, leading to an occasional segfault.
Also, the segfault goes away if I disable optimization prior to sending the IR to NVVM, as in https://github.com/numba/numba/issues/5576#issuecomment-646548553
I think this is another argument for not optimizing the IR with llvmlite’s LLVM prior to sending the IR to NVVM.