question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Mask RCNN do not support trace on GPU

See original GitHub issue

To reproduce

...
model = ...
model = model.cuda(
model.eval()

inp = torch.Tensor(data_np).cuda(

with torch.no_grad():
    out = model(inp)
    script_module = do_trace(model, inp)

However get:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! ......

Enviroment

PyTorch version: 1.9.0+cu102 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: 8.0.1 (tags/RELEASE_801/final) CMake version: version 3.19.6 Libc version: glibc-2.17

Python version: 3.7.10 (default, Jun 4 2021, 14:48:32) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.4.0-1049-aws-x86_64-with-debian-buster-sid Is CUDA available: True CUDA runtime version: 11.0.221 GPU models and configuration: GPU 0: Tesla T4 Nvidia driver version: 450.119.03 cuDNN version: Probably one of the following: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.5 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7.6.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5 HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.19.5 [pip3] torch==1.9.0 [pip3] torchaudio==0.9.0 [pip3] torchvision==0.10.0 [conda] numpy 1.19.5 pypi_0 pypi [conda] torch 1.9.0 pypi_0 pypi [conda] torchaudio 0.9.0 pypi_0 pypi [conda] torchvision 0.10.0 pypi_0 pypi

cc @datumbox

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
masahicommented, Oct 23, 2021

Sorry, if I import PyTorch before TVM, the error I got during tracing MaskRCNN is gone. Tracing and ONNX export still work with this release.

2reactions
masahicommented, Oct 23, 2021

Unfortunately, with the release v.0.11.0, tracing on CPU no longer works either. torch.jit.script is not useful for downstream backends because of all the irrelevant python junks it generates, so end to end conversion from Torchscript to other IRs is basically impossible.

In particular, ONNX export also uses tracing, so ONNX export of MaskRCNN appears to be broken with the new release. I’m now sure how test_onnx.py is running the MaskRCNN test https://github.com/pytorch/vision/blob/main/test/test_onnx.py#L485.

I want to take a look at what broke tracing MaskRCNN, and if the fix is simple, I’d like to revive discussion on tracing / ONNX export support of detection models.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Mask-RCNN Tracing fails · Issue #72920 · pytorch ... - GitHub
The reason that tracing is failing is as follows. First, you call torch.jit.trace and the module is traced. Then, torch.jit.trace traces the ...
Read more >
OpenCV 'dnn' with NVIDIA GPUs: 1549% faster YOLO, SSD ...
In this tutorial, you'll learn how to use OpenCV's “dnn” module with an NVIDIA GPU for up to 1,549% faster object detection (YOLO...
Read more >
Run Mask R-CNN on GPU with Pytorch (on Ubuntu) - Pysource
In this tutorial we are going to see how to run the Mask R-CNN algorythm using the GPU on the Ubuntu os system....
Read more >
MATLAB trainMaskRCNN - MathWorks
A trained Mask R-CNN network object can perform instance segmentation to detect and segment multiple object classes. This syntax supports transfer learning ...
Read more >
MaskRCNN — TAO Toolkit 3.22.05 documentation
--gpus num_gpus : The number of GPUs to use and processes to launch for training. The default value is 1. ... MaskRCNN does...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found