Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`torchvision` breaks in official `pytorch` Docker image: `RuntimeError: Couldn't load custom C++ ops.`

See original GitHub issue

🐛 Bug

I’m using the pytorch/pytorch:1.9.0-cuda10.2-cudnn7-runtime Docker image and trying to install torchvision on top. The installation proceeds as expected, but if I try to call a function that uses custom C++ ops (such as torchvision.ops.nms), I get the following error message:

RuntimeError: Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.__version__ and your torchvision version with torchvision.__version__ and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.

I can confirm that the installed versions are compatible by bashing into the container and opening a Python prompt:

>>> import torch
>>> torch.__version__
'1.9.0'
>>> import torchvision
>>> torchvision.__version__
'0.10.0'
>>> import torchvision.ops

This issue occurs regardless of if I install pytorch by:

Using pip, i.e., RUN pip install torchvision
Using conda without a version pin, i.e., RUN conda install -c pytorch torchvision
Using conda with a version pin, i.e., RUN conda install -c pytorch torchvision=0.10.0

To Reproduce

Steps to reproduce the behavior:

In a new directory:

Create a minimal Dockerfile with the following content:

FROM pytorch/pytorch:1.9.0-cuda10.2-cudnn7-runtime

RUN conda install -c pytorch torchvision

COPY ./test.py ./test.py

ENTRYPOINT ["python", "test.py"]

Create a minimal test.py with the following content:

import torchvision.ops

torchvision.ops.nms(None, None, 0.0)

Build and run the container:

docker build -t torchvisiondockerbug . && docker run torchvisiondockerbug

Observe the following output:

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    torchvision.ops.nms(None, None, 0.0)
  File "/opt/conda/lib/python3.7/site-packages/torchvision/ops/boxes.py", line 34, in nms
    _assert_has_ops()
  File "/opt/conda/lib/python3.7/site-packages/torchvision/extension.py", line 63, in _assert_has_ops
    "Couldn't load custom C++ ops. This can happen if your PyTorch and "
RuntimeError: Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.__version__ and your torchvision version with torchvision.__version__ and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.

Expected behavior

I expect to be able to load custom C++ ops, because torch 1.9.0 and torchvision 0.10.0 are marked as compatible in torchvision’s compatibility matrix.

In a working environment, the output of test.py looks like this:

Traceback (most recent call last):
  File "test.py", line 3, in <module>
    torchvision.ops.nms(None, None, 0.0)
  File "/home/joe/.pyenv/versions/pytorch_problem/lib/python3.7/site-packages/torchvision/ops/boxes.py", line 35, in nms
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
RuntimeError: torchvision::nms() Expected a value of type 'Tensor' for argument 'dets' but instead found type 'NoneType'.
Position: 0
Value: None
Declaration: torchvision::nms(Tensor dets, Tensor scores, float iou_threshold) -> (Tensor)
Cast error details: Unable to cast Python instance to C++ type (compile in debug mode for details)

(Yes, this is still an error, but it at least demonstrates that _assert_has_ops is successful.)

Environment

Output of running collect_env.py inside the Docker container:

Collecting environment information...
PyTorch version: 1.9.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.10

Python version: 3.7.10 (default, Feb 26 2021, 18:47:35)  [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-5.4.72-microsoft-standard-WSL2-x86_64-with-debian-buster-sid
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] torch==1.9.0
[pip3] torchelastic==0.2.0
[pip3] torchtext==0.10.0
[pip3] torchvision==0.10.0
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               10.2.89              h6bb024c_0    nvidia
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.2.0           h06a4308_296
[conda] mkl-service               2.3.0            py37h27cfd23_1
[conda] mkl_fft                   1.3.0            py37h42c9631_2
[conda] mkl_random                1.2.1            py37ha9443f7_2
[conda] numpy                     1.20.2           py37h2d18471_0
[conda] numpy-base                1.20.2           py37hfae3a4d_0
[conda] pytorch                   1.9.0           py3.7_cuda10.2_cudnn7.6.5_0    pytorch
[conda] torchelastic              0.2.0                    pypi_0    pypi
[conda] torchtext                 0.10.0                     py37    pytorch
[conda] torchvision               0.10.0               py37_cu102    pytorch

Issue Analytics

State:
Created 2 years ago
Comments:7 (1 by maintainers)

Top GitHub Comments

3reactions

sberrymancommented, Aug 14, 2021

FYI: I was able to get torchvision to work using the pytorch/pytorch:1.9.0-cuda11.1-cudnn8-devel container.

RUN pip3 install \
    torchvision==0.10.0+cu111 \
    -f https://download.pytorch.org/whl/torch_stable.html

2reactions

indamcommented, Aug 10, 2021

Having the same issue with 1.9.0-cuda11.1-cudnn8-runtime

Top Results From Across the Web

torchvision: Couldn't load custom C++ ops #5511 - GitHub

The full error information is as follows:RuntimeError: Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision ...

Solution for "RuntimeError: Couldn't load custom C++ ops"

RuntimeError : Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors ......

PyTorch for Jetson - NVIDIA Developer Forums

Can I use c++ torch, tensorrt in Jetson Xavier at the same time? Importing PyTorch fails in L4T R32.3.1 Docker image on Jetson...

When Using Torchvision.Ops.Roialign I Got The Error:Process ...

torchvision breaks in official pytorch Docker image: RuntimeError: Couldn't load custom C++ ops. ops or try the search function.Example 1.

PyTorch/CHANGELOG and PyTorch Releases (Page 2)

The PyTorch 1.7 release includes a number of new APIs including support for NumPy-Compatible FFT operations, profiling tools and major updates to both ......