Model takes very long to load and tutorial script fails
See original GitHub issueIf you do not know the root cause of the problem / bug, and wish someone to help you, please include:
When I try to run the code from the Detectron2 Tutorial Collab, the model takes an extremely long time to load and then crashes with a CUDA Error / Segmentation fault.
To Reproduce
- what changes you made / what code you wrote: None
- what command you run: Taken from Detectron2 Tutorial Collab
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()
# import some common libraries
import matplotlib.pyplot as plt
import numpy as np
import cv2
im = cv2.imread("./input.jpg")
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
cfg = get_cfg()
cfg.merge_from_file("configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 # set threshold for this model
cfg.MODEL.WEIGHTS = "detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl"
predictor = DefaultPredictor(cfg)
outputs = predictor(im)
- what you observed (full logs are preferred):
- The step
DefaultPredictor(cfg)
takes an extremely long time (>15 minutes). Specifically, the commandbuild_model(cfg)
within the class__init__()
takes this long to complete. - Minimal usage of the GPU and maximal usage of the CPU (
top
lists the python process as taking up 100% of CPU power and approx. 3.5% of memory while the GPU takes only approximately 500MB out of available 16GB). It appears that PyTorch is attempting to execute everything on the CPU. - Once the model is finally built, attempting to run prediction results in an error:
WARNING [10/11 07:45:38 d2.config.compat]: Config 'configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml' has no VERSION. Assuming it to be compatible with latest v2.
Traceback (most recent call last):
File "test.py", line 19, in <module>
outputs = predictor(im)
File "/home/jan/miniconda3/envs/detectron/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
return func(*args, **kwargs)
File "/home/jan/detectron2/detectron2/engine/defaults.py", line 171, in __call__
height, width = original_image.shape[:2]
AttributeError: 'NoneType' object has no attribute 'shape'
However, I’ve checked all CUDA versions and everything points to CUDA 10.1, so I don’t think this is a version mismatch:
$ conda list cuda
# packages in environment at /home/jan/miniconda3/envs/detectron:
#
# Name Version Build Channel
cudatoolkit 10.1.168 0
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Apr_24_19:10:27_PDT_2019
Cuda compilation tools, release 10.1, V10.1.168
$ echo $LD_LIBRARY_PATH
:/usr/local/cuda-10.1.back/lib64
$ nvidia-smi
Fri Oct 11 07:45:41 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:00:04.0 Off | 0 |
| N/A 49C P0 41W / 250W | 269MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... Off | 00000000:00:05.0 Off | 0 |
| N/A 51C P0 41W / 250W | 10MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 16135 C python 259MiB |
+-----------------------------------------------------------------------------+
All required versions as per the install page are fulfilled:
$ python --version
Python 3.6.9 :: Anaconda, Inc.
$ conda list
# packages in environment at /home/jan/miniconda3/envs/detectron:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
absl-py 0.8.1 pypi_0 pypi
backcall 0.1.0 py36_0
blas 1.0 mkl
bzip2 1.0.8 h7b6447c_0
ca-certificates 2019.8.28 0
cairo 1.14.12 h8948797_3
certifi 2019.9.11 py36_0
cffi 1.12.3 py36h2e261b9_0
cloudpickle 1.2.2 pypi_0 pypi
cudatoolkit 10.1.168 0
cycler 0.10.0 pypi_0 pypi
cython 0.29.13 pypi_0 pypi
decorator 4.4.0 py36_1
detectron2 0.1 dev_0 <develop>
ffmpeg 4.0 hcdf2ecd_0
fontconfig 2.13.0 h9420a91_0
freeglut 3.0.0 hf484d3e_5
freetype 2.9.1 h8a8886c_1
fvcore 0.1 pypi_0 pypi
glib 2.56.2 hd408876_0
graphite2 1.3.13 h23475e2_0
grpcio 1.24.1 pypi_0 pypi
harfbuzz 1.8.8 hffaf4a1_0
hdf5 1.10.2 hba1933b_1
icu 58.2 h9c2bf20_1
intel-openmp 2019.4 243
ipython 7.8.0 py36h39e3cac_0
ipython_genutils 0.2.0 py36_0
jasper 2.0.14 h07fcdf6_1
jedi 0.15.1 py36_0
jpeg 9b h024ee3a_2
kiwisolver 1.1.0 pypi_0 pypi
libedit 3.1.20181209 hc058e9b_0
libffi 3.2.1 hd88cf55_4
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0
libglu 9.0.0 hf484d3e_1
libopencv 3.4.2 hb342d67_1
libopus 1.3 h7b6447c_0
libpng 1.6.37 hbc83047_0
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.0.10 h2733197_2
libuuid 1.0.3 h1bed415_2
libvpx 1.7.0 h439df22_0
libxcb 1.13 h1bed415_1
libxml2 2.9.9 hea5a465_1
markdown 3.1.1 pypi_0 pypi
matplotlib 3.1.1 pypi_0 pypi
mkl 2019.4 243
mkl-service 2.3.0 py36he904b0f_0
mkl_fft 1.0.14 py36ha843d7b_0
mkl_random 1.1.0 py36hd6b4f25_0
ncurses 6.1 he6710b0_1
ninja 1.9.0 py36hfd86e86_0
numpy 1.17.2 py36haad9e8e_0
numpy-base 1.17.2 py36hde5b4d6_0
olefile 0.46 py36_0
opencv 3.4.2 py36h6fd60c2_1
openssl 1.1.1d h7b6447c_2
parso 0.5.1 py_0
pcre 8.43 he6710b0_0
pexpect 4.7.0 py36_0
pickleshare 0.7.5 py36_0
pillow 6.2.0 py36h34e0f95_0
pip 19.2.3 py36_0
pixman 0.38.0 h7b6447c_0
portalocker 1.5.1 pypi_0 pypi
prompt_toolkit 2.0.10 py_0
protobuf 3.10.0 pypi_0 pypi
ptyprocess 0.6.0 py36_0
py-opencv 3.4.2 py36hb342d67_1
pycocotools 2.0 pypi_0 pypi
pycparser 2.19 py36_0
pygments 2.4.2 py_0
pyparsing 2.4.2 pypi_0 pypi
python 3.6.9 h265db76_0
python-dateutil 2.8.0 pypi_0 pypi
pytorch 1.3.0 py3.6_cuda10.1.243_cudnn7.6.3_0 pytorch
pyyaml 5.1.2 pypi_0 pypi
readline 7.0 h7b6447c_5
setuptools 41.4.0 py36_0
shapely 1.6.4.post2 pypi_0 pypi
six 1.12.0 py36_0
sqlite 3.30.0 h7b6447c_0
tensorboard 2.0.0 pypi_0 pypi
termcolor 1.1.0 pypi_0 pypi
tk 8.6.8 hbc83047_0
torchvision 0.4.1 py36_cu101 pytorch
tqdm 4.36.1 pypi_0 pypi
traitlets 4.3.3 py36_0
wcwidth 0.1.7 py36_0
werkzeug 0.16.0 pypi_0 pypi
wheel 0.33.6 py36_0
xz 5.2.4 h14c3975_4
yacs 0.1.6 pypi_0 pypi
zlib 1.2.11 h7b6447c_3
zstd 1.3.7 h0b5b093_0
$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
Expected behavior
PyTorch appears to run primarily on the CPU instead of the GPU. I expect it to run primarily on the GPU. As I’ve made no edits to the code, I also expect it to run error-free.
Environment
$ python -m detectron2.utils.collect_env
--------------------- -------------------------------------------------------------------
Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) [GCC 7.3.0]
Detectron2 Compiler GCC 5.4
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.3.0
PyTorch Debug Build False
CUDA available True
GPU 0,1 Tesla P100-PCIE-16GB
Pillow 6.2.0
cv2 3.4.2
--------------------- -------------------------------------------------------------------
PyTorch built with:
- GCC 7.3
- Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CUDA Runtime 10.1
- NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_50,code=compute_50
- CuDNN 7.6.3
- Magma 2.5.1
- Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
Issue Analytics
- State:
- Created 4 years ago
- Comments:5
Top Results From Across the Web
Script Errors (What They Are and How to Fix Them) - Lifewire
A script error is an error that occurs when the instructions from a script can't be executed correctly for some reason.
Read more >What causes a Script error and how to solve them · Raygun Blog
While Script error is caused by violating the browser's same-origin policy, a Long Running Script indicates performance issues. Every browser ...
Read more >Troubleshoot form issues in model-driven apps - Microsoft Learn
Follow up with the script owner to further troubleshoot the issue. Form freezes, loads slowly, or throws unexplained errors. Issue. There are ...
Read more >Groovy script cannot be executed due to "Method code too ...
This error basically means there is / are large methods in Groovy code: JVM has a limitation where a method cannot be larger...
Read more >Fix for PowerShell Script cannot be loaded because running ...
How do you enable running scripts is disabled on this system error? To fix this issue, we have to set the execution policy,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The version of cudatoolkit is 10.1.168, while pytorch1.3 is build with cuda 10.1.243. Maybe this is the problem live in.
Thank you very much for the help! I can confirm that the Conda version of PyTorch from last week wasn’t properly compiled to support my GPU. This has been fixed and the newest PyTorch version downloaded via Conda works error-free.