Mask RCNN do not support trace on GPU
See original GitHub issueTo reproduce
...
model = ...
model = model.cuda(
model.eval()
inp = torch.Tensor(data_np).cuda(
with torch.no_grad():
out = model(inp)
script_module = do_trace(model, inp)
However get:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! ......
Enviroment
PyTorch version: 1.9.0+cu102 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: 8.0.1 (tags/RELEASE_801/final) CMake version: version 3.19.6 Libc version: glibc-2.17
Python version: 3.7.10 (default, Jun 4 2021, 14:48:32) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.4.0-1049-aws-x86_64-with-debian-buster-sid Is CUDA available: True CUDA runtime version: 11.0.221 GPU models and configuration: GPU 0: Tesla T4 Nvidia driver version: 450.119.03 cuDNN version: Probably one of the following: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.5 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7.6.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5 HIP runtime version: N/A MIOpen runtime version: N/A
Versions of relevant libraries: [pip3] numpy==1.19.5 [pip3] torch==1.9.0 [pip3] torchaudio==0.9.0 [pip3] torchvision==0.10.0 [conda] numpy 1.19.5 pypi_0 pypi [conda] torch 1.9.0 pypi_0 pypi [conda] torchaudio 0.9.0 pypi_0 pypi [conda] torchvision 0.10.0 pypi_0 pypi
cc @datumbox
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (6 by maintainers)
Top GitHub Comments
Sorry, if I import PyTorch before TVM, the error I got during tracing MaskRCNN is gone. Tracing and ONNX export still work with this release.
Unfortunately, with the release v.0.11.0, tracing on CPU no longer works either.
torch.jit.script
is not useful for downstream backends because of all the irrelevant python junks it generates, so end to end conversion from Torchscript to other IRs is basically impossible.In particular, ONNX export also uses tracing, so ONNX export of MaskRCNN appears to be broken with the new release. I’m now sure how
test_onnx.py
is running the MaskRCNN test https://github.com/pytorch/vision/blob/main/test/test_onnx.py#L485.I want to take a look at what broke tracing MaskRCNN, and if the fix is simple, I’d like to revive discussion on tracing / ONNX export support of detection models.