question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

Duplicated class name in ImageNet dataset.

See original GitHub issue

šŸ› Bug

Thereā€™re two classes in ImageNet dataset that share the same class name, but not the content (134 and 517). So Iā€™ve met key errors when using class_to_idx

To Reproduce

Steps to reproduce the behavior:

  1. Create dataset with data = datasets.ImageNet(root, 'train')
  2. Try get data.class_to_idx[134]
  3. Get key errors.

image

Expected behavior

No key error should be met here.

Environment

Please copy and paste the output from our environment collection script (or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
PyTorch version: 1.8.1+cu102                                                                                                                                                    
Is debug build: False                                                                                                                                                           
CUDA used to build PyTorch: 10.2                                                                                                                                                
ROCM used to build PyTorch: N/A                                                                                                                                                 
                                                                                                                                                                                
OS: Ubuntu 18.04.5 LTS (x86_64)                                                                                                                                                 
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0                                                                                                                                
Clang version: Could not collect                                                                                                                                                
CMake version: Could not collect                                                                                                                                                
Libc version: glibc-2.10                                                                                                                                                        
                                                                                                                                                                                
Python version: 3.7.4 (default, Aug 13 2019, 20:35:49)  [GCC 7.3.0] (64-bit runtime)                                                                                            
Python platform: Linux-4.15.0-147-generic-x86_64-with-debian-buster-sid                                                                                                         
Is CUDA available: True                                                                                                                                                         
CUDA runtime version: Could not collect                                                                                                                                         
GPU models and configuration:                                                                                                                                                   
GPU 0: GeForce RTX 2080 Ti                                                                                                                                                      
GPU 1: GeForce RTX 2080 Ti                                                                                                                                                      
                                                                                                                                                                                
Nvidia driver version: 460.67                                                                                                                                                   
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.3.1                                                                                                                      
HIP runtime version: N/A                                                                                                                                                        
MIOpen runtime version: N/A                                                                                                                                                     
                                                                                                                                                                                
Versions of relevant libraries:                                                                                                                                                 
[pip3] numpy==1.17.2                                                                                                                                                            
[pip3] numpydoc==0.9.1                                                                                                                                                          
[pip3] torch==1.8.1+cu102                                                                                                                                                       
[pip3] torchattacks==2.14.2                                                                                                                                                     
[pip3] torchaudio==0.8.1                                                                                                                                                        
[pip3] torchsummary==1.5.1                                                                                                                                                      
[pip3] torchvision==0.9.1+cu102                                                                                                                                                 
[conda] blas                      1.0                         mkl    https://repo.anaconda.com/pkgs/main                                                                        
[conda] cudatoolkit               10.0.130                      0    defaults                                                                                                   
[conda] mkl                       2019.4                      243    https://repo.anaconda.com/pkgs/main                                                                        
[conda] mkl-service               2.3.0            py37he904b0f_0    https://repo.anaconda.com/pkgs/main                                                                        
[conda] mkl_fft                   1.0.14           py37ha843d7b_0    https://repo.anaconda.com/pkgs/main                                                                        
[conda] mkl_random                1.1.0            py37hd6b4f25_0    https://repo.anaconda.com/pkgs/main                                                                        
[conda] numpy                     1.17.2           py37haad9e8e_0    https://repo.anaconda.com/pkgs/main                                                                        
[conda] numpy-base                1.17.2           py37hde5b4d6_0    https://repo.anaconda.com/pkgs/main                                                                        
[conda] numpydoc                  0.9.1                      py_0    https://repo.anaconda.com/pkgs/main                                                                        
[conda] torch                     1.8.1+cu102              pypi_0    pypi                                                                                                       
[conda] torchattacks              2.14.2                   pypi_0    pypi                                                                                                       
[conda] torchaudio                0.8.1                    pypi_0    pypi                                                                                                       
[conda] torchsummary              1.5.1                    pypi_0    pypi                                                                                                       
[conda] torchvision               0.9.1+cu102              pypi_0    pypi

Additional context

cc @pmeier

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
huihui-vcommented, Jul 19, 2021

@pmeier Thanks for replying!

I was just trying to create some new datasets following the torchvision official implementation, with similar APIs but some modifications for my own usage (my case here is to use a subset of ImageNet for some quick testing). Anyway, I just got it done with the classes attribute right now, and it works just fine!

I think itā€™s better to add some postfix when the duplication class names were met, for example, crane_1 and crane_2. But Itā€™s not a big problem, so Iā€™ll close this issue now.

Thank you for your attention!

0reactions
pmeiercommented, Jul 19, 2021

I think itā€™s better to add some postfix when the duplication class names were met, for example, crane_1 and crane_2. But Itā€™s not a big problem, so Iā€™ll close this issue now.

WordNet actually describes a taxonomy of objects. So in my example above, grass snake is a valid category for n01729977 and n01735189. Thus, we cannot simply enumerate these into grass_snake_1 and grass_snake_2, because they are the same category.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why is there a discrepancy in the imagenet dataset labels?
If we use the first label mapping that corresponds to the actual validation images, we face another problem: 2 classes ("Crane" and "maillot")Ā ......
Read more >
ImageNet dataset duplicated label Ā· Issue #1734 Ā· pytorch/vision
When I observe the imagenet.py code, it seems that the two labels are unintentionally merged into one. Because class_to_idx is made into aĀ ......
Read more >
Find and remove duplicate images in your dataset
Finding duplicate images manually in a dataset with millions of images is an expensive effort.
Read more >
Detect and remove duplicate images from a dataset for deep ...
Learn how to detect and remove duplicate images when building a dataset for deep learning.
Read more >
arXiv:1902.00423v2 [cs.CV] 2 Jun 2020
datasets have duplicates in the training set. These dupli- ... ImageNet Large Scale Visual Recognition Challenge.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found