`torchvision` breaks in official `pytorch` Docker image: `RuntimeError: Couldn't load custom C++ ops.`
See original GitHub issue🐛 Bug
I’m using the pytorch/pytorch:1.9.0-cuda10.2-cudnn7-runtime Docker image and trying to install torchvision
on top. The installation proceeds as expected, but if I try to call a function that uses custom C++ ops (such as torchvision.ops.nms
), I get the following error message:
RuntimeError: Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.__version__ and your torchvision version with torchvision.__version__ and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.
I can confirm that the installed versions are compatible by bashing into the container and opening a Python prompt:
>>> import torch
>>> torch.__version__
'1.9.0'
>>> import torchvision
>>> torchvision.__version__
'0.10.0'
>>> import torchvision.ops
This issue occurs regardless of if I install pytorch by:
- Using
pip
, i.e.,RUN pip install torchvision
- Using
conda
without a version pin, i.e.,RUN conda install -c pytorch torchvision
- Using
conda
with a version pin, i.e.,RUN conda install -c pytorch torchvision=0.10.0
To Reproduce
Steps to reproduce the behavior:
In a new directory:
- Create a minimal
Dockerfile
with the following content:
FROM pytorch/pytorch:1.9.0-cuda10.2-cudnn7-runtime
RUN conda install -c pytorch torchvision
COPY ./test.py ./test.py
ENTRYPOINT ["python", "test.py"]
- Create a minimal
test.py
with the following content:
import torchvision.ops
torchvision.ops.nms(None, None, 0.0)
- Build and run the container:
docker build -t torchvisiondockerbug . && docker run torchvisiondockerbug
- Observe the following output:
Traceback (most recent call last):
File "test.py", line 3, in <module>
torchvision.ops.nms(None, None, 0.0)
File "/opt/conda/lib/python3.7/site-packages/torchvision/ops/boxes.py", line 34, in nms
_assert_has_ops()
File "/opt/conda/lib/python3.7/site-packages/torchvision/extension.py", line 63, in _assert_has_ops
"Couldn't load custom C++ ops. This can happen if your PyTorch and "
RuntimeError: Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.__version__ and your torchvision version with torchvision.__version__ and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.
Expected behavior
I expect to be able to load custom C++ ops, because torch
1.9.0 and torchvision
0.10.0 are marked as compatible in torchvision’s compatibility matrix.
In a working environment, the output of test.py
looks like this:
Traceback (most recent call last):
File "test.py", line 3, in <module>
torchvision.ops.nms(None, None, 0.0)
File "/home/joe/.pyenv/versions/pytorch_problem/lib/python3.7/site-packages/torchvision/ops/boxes.py", line 35, in nms
return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
RuntimeError: torchvision::nms() Expected a value of type 'Tensor' for argument 'dets' but instead found type 'NoneType'.
Position: 0
Value: None
Declaration: torchvision::nms(Tensor dets, Tensor scores, float iou_threshold) -> (Tensor)
Cast error details: Unable to cast Python instance to C++ type (compile in debug mode for details)
(Yes, this is still an error, but it at least demonstrates that _assert_has_ops is successful.)
Environment
Output of running collect_env.py
inside the Docker container:
Collecting environment information...
PyTorch version: 1.9.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.10
Python version: 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-5.4.72-microsoft-standard-WSL2-x86_64-with-debian-buster-sid
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] torch==1.9.0
[pip3] torchelastic==0.2.0
[pip3] torchtext==0.10.0
[pip3] torchvision==0.10.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.2.89 h6bb024c_0 nvidia
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.2.0 h06a4308_296
[conda] mkl-service 2.3.0 py37h27cfd23_1
[conda] mkl_fft 1.3.0 py37h42c9631_2
[conda] mkl_random 1.2.1 py37ha9443f7_2
[conda] numpy 1.20.2 py37h2d18471_0
[conda] numpy-base 1.20.2 py37hfae3a4d_0
[conda] pytorch 1.9.0 py3.7_cuda10.2_cudnn7.6.5_0 pytorch
[conda] torchelastic 0.2.0 pypi_0 pypi
[conda] torchtext 0.10.0 py37 pytorch
[conda] torchvision 0.10.0 py37_cu102 pytorch
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (1 by maintainers)
Top GitHub Comments
FYI: I was able to get torchvision to work using the
pytorch/pytorch:1.9.0-cuda11.1-cudnn8-devel
container.Having the same issue with 1.9.0-cuda11.1-cudnn8-runtime