Missing `fixed_size` value in GeneralizedRCNNTransform breaks Faster-RCNN torchscript loading with C++ in train mode
See original GitHub issue🐛 Describe the bug
fixed_size
value in GeneralizedRCNNTransform
instantiation for faster_rcnn defaults to None
which breaks torchcript inference in C++.
See https://github.com/pytorch/vision/blob/7947fc8fb38b1d3a2aca03f22a2e6a3caa63f2a0/torchvision/models/detection/faster_rcnn.py#L234 and compare to https://github.com/pytorch/vision/blob/7947fc8fb38b1d3a2aca03f22a2e6a3caa63f2a0/torchvision/models/detection/ssd.py#L203 where fixed_size
is explicitely set.
Thus with faster_rcnn, fixed_size
defaults to None
and loading from C++ yields:
Dynamic exception type: torch::jit::ErrorReport
std::exception::what:
Unknown type name 'NoneType':
Serialized File "code/__torch__/torchvision/models/detection/transform.py", line 11
image_std : List[float]
size_divisible : int
fixed_size : NoneType
~~~~~~~~ <--- HERE
def forward(self: __torch__.torchvision.models.detection.transform.GeneralizedRCNNTransform,
images: List[Tensor],
To reproduce, we export the model with torch.jit.script
for fasterrcnn_resnet50_fpn
and we load from C++ with torch::jit::load()
.
Actually the exact export Python code we use is here: https://github.com/jolibrain/deepdetect/blob/master/tools/torch/trace_torchvision.py and we run:
python3 trace_torchvision.py fasterrcnn_resnet50_fpn --num_classes 2
Versions
PyTorch version: 1.9.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.21.1
Libc version: glibc-2.25
Python version: 3.6.9 (default, Jan 26 2021, 15:33:00) [GCC 8.4.0] (64-bit runtime)
Python platform: Linux-4.15.0-151-generic-x86_64-with-Ubuntu-18.04-bionic
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: NVIDIA RTX A5000
GPU 1: NVIDIA TITAN X (Pascal)
GPU 2: NVIDIA GeForce GTX 1080 Ti
Nvidia driver version: 470.57.02
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.2
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.18.1
[pip3] torch==1.9.0+cu111
[pip3] torchaudio==0.9.0
[pip3] torchfile==0.1.0
[pip3] torchvision==0.10.0+cu111
[pip3] torchviz==0.0.1
[conda] Could not collect
```
cc @datumbox
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (5 by maintainers)
Top GitHub Comments
@datumbox Thanks very much. So further tests on our side did reveal that the C++ build was using torch 1.8, while with 1.9 there’s no error. My deepest apologies for the time required on your side, maybe PR #4369 remains useful if only by principle of having properly typed signatures. I’m closing this issue, thanks again for this and for the excellent work by the torchvision team!
@beniz I’ve temporarily modified a similar test that we have at vision here to export the model on train mode. I then passed data through it and I don’t get any errors, see here.
Without being able to properly reproduce the error you see, it’s hard to provide guidance. Would you be able to send a dummy PR where you modify the above scripts in a way that they get similar to your setup and reproduce the error on our CI (see the linked commit above for example)? If you manage to reproduce it with a minimal example, I can help you investigate further.