Strange behavior in detection reference code
See original GitHub issueš Describe the bug
Hi,
I am using your training procedure for object detection https://github.com/pytorch/vision/blob/main/references/detection/train.py
with a custom dataset. When I evaluate my model, the output seems correct for the very first epoch but for the following epochs, the metrics fall to 0.
However, this is not related to the model performance as I can use a checkpoint and evaluate it in another process, which gives back expected values.
From the code, the difference between COCO dataset and a custom dataset happens here: https://github.com/pytorch/vision/blob/f4fd19335fca4dbb987603c08368be9496dd316d/references/detection/coco_utils.py#L203
I suppose that the current behavior is not expected. Have you ever faced a similar issue and how can I correct it ?
Renaud
Versions
Collecting environment informationā¦ PyTorch version: 1.10.1 Is debug build: False CUDA used to build PyTorch: 11.3 ROCM used to build PyTorch: N/A
OS: CentOS Linux release 8.2.2004 (Core) (x86_64) GCC version: (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5) Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.28
Python version: 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-4.18.0-193.6.3.el8_2.x86_64-x86_64-with-glibc2.28 Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: GeForce RTX 2080 Ti Nvidia driver version: 450.57 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A
Versions of relevant libraries: [pip3] numpy==1.21.2 [pip3] torch==1.10.1 [pip3] torchvision==0.11.2 [conda] blas 1.0 mkl [conda] cudatoolkit 11.3.1 h2bc3f7f_2 [conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] mkl 2021.4.0 h06a4308_640 [conda] mkl-service 2.4.0 py39h7f8727e_0 [conda] mkl_fft 1.3.1 py39hd3c417c_0 [conda] mkl_random 1.2.2 py39h51133e4_0 [conda] numpy 1.21.2 py39h20f2e39_0 [conda] numpy-base 1.21.2 py39h79a1101_0 [conda] pytorch 1.10.1 py3.9_cuda11.3_cudnn8.2.0_0 pytorch [conda] pytorch-mutex 1.0 cuda pytorch [conda] torchvision 0.11.2 py39_cu113 pytorch
cc @datumbox
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (5 by maintainers)
Top GitHub Comments
@datumbox Indeed the problem comes from
bboxes[:, 2:] -= bboxes[:, :2]
and is solved by cloning it. I also checked that themasks
were not changed with the current version and it works fine.@rvandeghen Thanks for raising this.
It is very difficult to tell whatās the problem given that I canāt reproduce the issue without having your custom dataset. Whatās unclear to me is why you need to deep-copy the
ds
given you donāt modify it.The reference script works fine with Coco which is its intended use-case. Moreover the script serves as a starting point for how one can build their own loops. With the info I have at this point, I donāt think applying the deep-copy patch in the general case is worth it.