Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Mask RCNN do not support trace on GPU

See original GitHub issue

To reproduce

...
model = ...
model = model.cuda(
model.eval()

inp = torch.Tensor(data_np).cuda(

with torch.no_grad():
    out = model(inp)
    script_module = do_trace(model, inp)

However get:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! ......

Enviroment

PyTorch version: 1.9.0+cu102 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: 8.0.1 (tags/RELEASE_801/final) CMake version: version 3.19.6 Libc version: glibc-2.17

Python version: 3.7.10 (default, Jun 4 2021, 14:48:32) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.4.0-1049-aws-x86_64-with-debian-buster-sid Is CUDA available: True CUDA runtime version: 11.0.221 GPU models and configuration: GPU 0: Tesla T4 Nvidia driver version: 450.119.03 cuDNN version: Probably one of the following: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.5 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7.6.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5 HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.19.5 [pip3] torch==1.9.0 [pip3] torchaudio==0.9.0 [pip3] torchvision==0.10.0 [conda] numpy 1.19.5 pypi_0 pypi [conda] torch 1.9.0 pypi_0 pypi [conda] torchaudio 0.9.0 pypi_0 pypi [conda] torchvision 0.10.0 pypi_0 pypi

cc @datumbox

Issue Analytics

State:
Created 2 years ago
Comments:7 (6 by maintainers)

Top GitHub Comments

2reactions

masahicommented, Oct 23, 2021

Sorry, if I import PyTorch before TVM, the error I got during tracing MaskRCNN is gone. Tracing and ONNX export still work with this release.

2reactions

masahicommented, Oct 23, 2021

Unfortunately, with the release v.0.11.0, tracing on CPU no longer works either. torch.jit.script is not useful for downstream backends because of all the irrelevant python junks it generates, so end to end conversion from Torchscript to other IRs is basically impossible.

In particular, ONNX export also uses tracing, so ONNX export of MaskRCNN appears to be broken with the new release. I’m now sure how test_onnx.py is running the MaskRCNN test https://github.com/pytorch/vision/blob/main/test/test_onnx.py#L485.

I want to take a look at what broke tracing MaskRCNN, and if the fix is simple, I’d like to revive discussion on tracing / ONNX export support of detection models.

Top Results From Across the Web

Mask-RCNN Tracing fails · Issue #72920 · pytorch ... - GitHub

The reason that tracing is failing is as follows. First, you call torch.jit.trace and the module is traced. Then, torch.jit.trace traces the ...

OpenCV 'dnn' with NVIDIA GPUs: 1549% faster YOLO, SSD ...

In this tutorial, you'll learn how to use OpenCV's “dnn” module with an NVIDIA GPU for up to 1,549% faster object detection (YOLO...

Run Mask R-CNN on GPU with Pytorch (on Ubuntu) - Pysource

In this tutorial we are going to see how to run the Mask R-CNN algorythm using the GPU on the Ubuntu os system....

MATLAB trainMaskRCNN - MathWorks

A trained Mask R-CNN network object can perform instance segmentation to detect and segment multiple object classes. This syntax supports transfer learning ...

MaskRCNN — TAO Toolkit 3.22.05 documentation

--gpus num_gpus : The number of GPUs to use and processes to launch for training. The default value is 1. ... MaskRCNN does...