Duplicated class name in ImageNet dataset.
See original GitHub issueš Bug
Thereāre two classes in ImageNet dataset that share the same class name, but not the content (134 and 517). So Iāve met key errors when using class_to_idx
To Reproduce
Steps to reproduce the behavior:
- Create dataset with
data = datasets.ImageNet(root, 'train')
- Try get
data.class_to_idx[134]
- Get key errors.
Expected behavior
No key error should be met here.
Environment
Please copy and paste the output from our environment collection script (or fill out the checklist below manually).
You can get the script and run it with:
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
PyTorch version: 1.8.1+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.10
Python version: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-4.15.0-147-generic-x86_64-with-debian-buster-sid
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti
Nvidia driver version: 460.67
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.3.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.17.2
[pip3] numpydoc==0.9.1
[pip3] torch==1.8.1+cu102
[pip3] torchattacks==2.14.2
[pip3] torchaudio==0.8.1
[pip3] torchsummary==1.5.1
[pip3] torchvision==0.9.1+cu102
[conda] blas 1.0 mkl https://repo.anaconda.com/pkgs/main
[conda] cudatoolkit 10.0.130 0 defaults
[conda] mkl 2019.4 243 https://repo.anaconda.com/pkgs/main
[conda] mkl-service 2.3.0 py37he904b0f_0 https://repo.anaconda.com/pkgs/main
[conda] mkl_fft 1.0.14 py37ha843d7b_0 https://repo.anaconda.com/pkgs/main
[conda] mkl_random 1.1.0 py37hd6b4f25_0 https://repo.anaconda.com/pkgs/main
[conda] numpy 1.17.2 py37haad9e8e_0 https://repo.anaconda.com/pkgs/main
[conda] numpy-base 1.17.2 py37hde5b4d6_0 https://repo.anaconda.com/pkgs/main
[conda] numpydoc 0.9.1 py_0 https://repo.anaconda.com/pkgs/main
[conda] torch 1.8.1+cu102 pypi_0 pypi
[conda] torchattacks 2.14.2 pypi_0 pypi
[conda] torchaudio 0.8.1 pypi_0 pypi
[conda] torchsummary 1.5.1 pypi_0 pypi
[conda] torchvision 0.9.1+cu102 pypi_0 pypi
Additional context
cc @pmeier
Issue Analytics
- State:
- Created 2 years ago
- Comments:6
Top Results From Across the Web
Why is there a discrepancy in the imagenet dataset labels?
If we use the first label mapping that corresponds to the actual validation images, we face another problem: 2 classes ("Crane" and "maillot")Ā ......
Read more >ImageNet dataset duplicated label Ā· Issue #1734 Ā· pytorch/vision
When I observe the imagenet.py code, it seems that the two labels are unintentionally merged into one. Because class_to_idx is made into aĀ ......
Read more >Find and remove duplicate images in your dataset
Finding duplicate images manually in a dataset with millions of images is an expensive effort.
Read more >Detect and remove duplicate images from a dataset for deep ...
Learn how to detect and remove duplicate images when building a dataset for deep learning.
Read more >arXiv:1902.00423v2 [cs.CV] 2 Jun 2020
datasets have duplicates in the training set. These dupli- ... ImageNet Large Scale Visual Recognition Challenge.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@pmeier Thanks for replying!
I was just trying to create some new datasets following the
torchvision
official implementation, with similar APIs but some modifications for my own usage (my case here is to use a subset of ImageNet for some quick testing). Anyway, I just got it done with theclasses
attribute right now, and it works just fine!I think itās better to add some postfix when the duplication class names were met, for example,
crane_1
andcrane_2
. But Itās not a big problem, so Iāll close this issue now.Thank you for your attention!
WordNet actually describes a taxonomy of objects. So in my example above,
grass snake
is a valid category forn01729977
andn01735189
. Thus, we cannot simply enumerate these intograss_snake_1
andgrass_snake_2
, because they are the same category.