question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError: CUDA error: no kernel image is available for execution on the device

See original GitHub issue

Hi, I’m using detectron2 on a computing cluster and thus have various gpus that the code will be run on as per allocation. detectron was installed successfully and i’m able to import it from python.

However I get the following error on certain(most) gpus:

RuntimeError: CUDA error: no kernel image is available for execution on the device (ROIAlign_forward_cuda at /network/home/guptagun/od/detectron2_repo/detectron2/layers/csrc/ROIAlign/ROIAlign_cuda.cu:361)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f2459bbc687 in /network/home/guptagun/anaconda3/envs/detectron/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: detectron2::ROIAlign_forward_cuda(at::Tensor const&, at::Tensor const&, float, int, int, int, bool) + 0xa24 (0x7f23f419189c in /network/home/guptagun/od/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
frame #2: detectron2::ROIAlign_forward(at::Tensor const&, at::Tensor const&, float, int, int, int, bool) + 0xb6 (0x7f23f4132f66 in /network/home/guptagun/od/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x4ec8f (0x7f23f4144c8f in /network/home/guptagun/od/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x49750 (0x7f23f413f750 in /network/home/guptagun/od/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
<omitting python frames>
frame #9: THPFunction_apply(_object*, _object*) + 0x8d6 (0x7f245a4abe96 in /network/home/guptagun/anaconda3/envs/detectron/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

While on some gpus (one of them being Geforce GTX) the code runs as expected.

I was trying to run the demo.py file through:

python detectron2_repo/demo/demo.py --config-file detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --input ./leftImg8bit.png --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl

Environment

output of python -m detectron2.utils.collect_env.

------------------------  --------------------------------------------------
sys.platform              linux
Python                    3.7.5 (default, Oct 25 2019, 15:51:11) [GCC 7.3.0]
Numpy                     1.15.4
Detectron2 Compiler       GCC 7.4
Detectron2 CUDA Compiler  10.0
DETECTRON2_ENV_MODULE     <not set>
PyTorch                   1.3.0
PyTorch Debug Build       False
torchvision               0.4.1a0+d94043a
CUDA available            True
GPU 0                     GeForce GTX TITAN X
CUDA_HOME                 None
Pillow                    5.3.0
cv2                       4.1.0
------------------------  --------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_50,code=compute_50
  - CuDNN 7.6.3
  - Magma 2.5.1
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

when i build detectron using : python setup.py build develop TORCH_CUDA_ARCH_LIST was set empty, and so it should have been compiled for all architectures? (acc to https://github.com/facebookresearch/detectron2/issues/62#issuecomment-549432420) What can I do while compiling so that I’m able to use detectron on most gpus, or is this an issue with the compute node I’m using?

Thanks, Gunshi

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:19

github_iconTop GitHub Comments

6reactions
dhaivat1729commented, Nov 22, 2019

@ppwwyyxx I am facing the exact same issue and my pytorch and detectron2 are compiled with exact same cuda versions. Also, I am facing this issue when I try to run detectron2 on different GPU than the one I have used to compile it. Here I compiled with titanX GPU, so it doesn’t work on titanrtx or other GPUs. Note that I haven’t installed using pip as I am modifying the codebase(only python files, not touching any cuda implementation) for my research, not sure if that has any effect though. Here is the output of python -m detectron2.utils.collect_env. Could you try installing on one GPU and test on other and see if this is general issue or I messed something up.

------------------------  --------------------------------------------------
sys.platform              linux
Python                    3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]
Numpy                     1.17.2
Detectron2 Compiler       GCC 7.4
Detectron2 CUDA Compiler  10.0
DETECTRON2_ENV_MODULE     <not set>
PyTorch                   1.3.0
PyTorch Debug Build       False
torchvision               0.4.1a0+d94043a
CUDA available            True
GPU 0                     TITAN V
CUDA_HOME                 /ai/apps/cuda/10.0
NVCC                      Cuda compilation tools, release 10.0, V10.0.130
Pillow                    6.2.0
cv2                       4.1.0
------------------------  --------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CUDA Runtime 10.0
  - NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_50,code=compute_50
  - CuDNN 7.6.3
  - Magma 2.5.1
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 
3reactions
ppwwyyxxcommented, Oct 13, 2020

You need to rebuild detectron2 with export TORCH_CUDA_ARCH_LIST=6.0;7.0. Or build on the machine where you run detectron2.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Cuda error: no kernel image is available for execution on the ...
It means that there is no binary for your GPU card. We only compile binaries for NV cards with CC 3.7 and up....
Read more >
RuntimeError: CUDA error: no kernel image is available for ...
RuntimeError : CUDA error: no kernel image is available for execution on the device. The system I am using is: Ubuntu 18.04. Cuda...
Read more >
Pytorch CUDA error: no kernel image is available for ...
Pytorch CUDA error: no kernel image is available for execution on the device on RTX 3090 with cuda 11.1 · Nvidia version: NVIDIA-SMI...
Read more >
CUDA error: no kernel image is available for execution on the ...
Hi, I'm trying to run OpenNMT-py on an RTX 3090 from vast.ai and getting a CUDA error: Traceback (most recent call last): File ......
Read more >
Facing no kernel image is available for execution on the ...
I am facing the error " no kernel image is available for execution on the device" returned from 'cudaGetLastError()" while trying to execute ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found