question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Missing `fixed_size` value in GeneralizedRCNNTransform breaks Faster-RCNN torchscript loading with C++ in train mode

See original GitHub issue

🐛 Describe the bug

fixed_size value in GeneralizedRCNNTransform instantiation for faster_rcnn defaults to None which breaks torchcript inference in C++.

See https://github.com/pytorch/vision/blob/7947fc8fb38b1d3a2aca03f22a2e6a3caa63f2a0/torchvision/models/detection/faster_rcnn.py#L234 and compare to https://github.com/pytorch/vision/blob/7947fc8fb38b1d3a2aca03f22a2e6a3caa63f2a0/torchvision/models/detection/ssd.py#L203 where fixed_size is explicitely set.

Thus with faster_rcnn, fixed_size defaults to None and loading from C++ yields:

Dynamic exception type: torch::jit::ErrorReport
std::exception::what: 
Unknown type name 'NoneType':
Serialized   File "code/__torch__/torchvision/models/detection/transform.py", line 11
  image_std : List[float]
  size_divisible : int
  fixed_size : NoneType
               ~~~~~~~~ <--- HERE
  def forward(self: __torch__.torchvision.models.detection.transform.GeneralizedRCNNTransform,
    images: List[Tensor],

To reproduce, we export the model with torch.jit.script for fasterrcnn_resnet50_fpn and we load from C++ with torch::jit::load().

Actually the exact export Python code we use is here: https://github.com/jolibrain/deepdetect/blob/master/tools/torch/trace_torchvision.py and we run:

python3 trace_torchvision.py fasterrcnn_resnet50_fpn --num_classes 2

Versions

PyTorch version: 1.9.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.21.1
Libc version: glibc-2.25

Python version: 3.6.9 (default, Jan 26 2021, 15:33:00)  [GCC 8.4.0] (64-bit runtime)
Python platform: Linux-4.15.0-151-generic-x86_64-with-Ubuntu-18.04-bionic
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: NVIDIA RTX A5000
GPU 1: NVIDIA TITAN X (Pascal)
GPU 2: NVIDIA GeForce GTX 1080 Ti

Nvidia driver version: 470.57.02
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.2
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.18.1
[pip3] torch==1.9.0+cu111
[pip3] torchaudio==0.9.0
[pip3] torchfile==0.1.0
[pip3] torchvision==0.10.0+cu111
[pip3] torchviz==0.0.1
[conda] Could not collect
```

cc @datumbox

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
benizcommented, Sep 6, 2021

@datumbox Thanks very much. So further tests on our side did reveal that the C++ build was using torch 1.8, while with 1.9 there’s no error. My deepest apologies for the time required on your side, maybe PR #4369 remains useful if only by principle of having properly typed signatures. I’m closing this issue, thanks again for this and for the excellent work by the torchvision team!

1reaction
datumboxcommented, Sep 6, 2021

@beniz I’ve temporarily modified a similar test that we have at vision here to export the model on train mode. I then passed data through it and I don’t get any errors, see here.

Without being able to properly reproduce the error you see, it’s hard to provide guidance. Would you be able to send a dummy PR where you modify the above scripts in a way that they get similar to your setup and reproduce the error on our CI (see the linked commit above for example)? If you manage to reproduce it with a minimal example, I can help you investigate further.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Torchscript conversion fail with fine-tune Faster-RCNN model
I test the conversion with BGR and RGB image in both c++ and python code, and again same result. Here are the code...
Read more >
TorchScript — PyTorch 1.13 documentation
TorchScript is a way to create serializable and optimizable models from PyTorch code. Any TorchScript program can be saved from a Python process...
Read more >
Loading a TorchScript Model in C++ - PyTorch Tutorials
To load your serialized PyTorch model in C++, your application must depend on the PyTorch C++ API – also known as LibTorch. The...
Read more >
torchvision Changelog - pyup.io
The weights can be loaded normally as follows: py from torchvision.models import * model1 = vit_h_14(weights="IMAGENET1K_SWAG_E2E_V1")
Read more >
Input image size of Faster-RCNN model in Pytorch
I would assume the model should work the best during validation using images with shapes and other properties as close as possible to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found