Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Download dataset return 403

See original GitHub issue

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

import torch.backends.cudnn
import torch.utils.data
import torchvision

# prepare parameters
n_epochs = 1  # 3
batch_size_train = 64
batch_size_test = 1000
learning_rate = 0.01
momentum = 0.5
log_interval = 10

random_seed = 1
torch.backends.cudnn.enabled = False
torch.manual_seed(random_seed)

# prepare dataset
train_loader = torch.utils.data.DataLoader(
    torchvision.datasets.MNIST('./tmp/files/', train=True, download=True,
                               transform=torchvision.transforms.Compose([
                                   torchvision.transforms.ToTensor(),
                                   torchvision.transforms.Normalize((0.1307,), (0.3081,))
                               ])),
    batch_size=batch_size_train, shuffle=True)

test_loader = torch.utils.data.DataLoader(
    torchvision.datasets.MNIST('./tmp/files/', train=False, download=True,
                               transform=torchvision.transforms.Compose([
                                   torchvision.transforms.ToTensor(),
                                   torchvision.transforms.Normalize(
                                       (0.1307,), (0.3081,))
                               ])),
    batch_size=batch_size_test, shuffle=True)

Expected behavior

Download dataset.

Environment

Please copy and paste the output from our environment collection script (or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

PyTorch version: 1.5.1+cu101 Is debug build: False CUDA used to build PyTorch: 10.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 16.04.6 LTS (x86_64) GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 Clang version: Could not collect CMake version: version 3.5.1

Python version: 3.6 (64-bit runtime) Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: GeForce RTX 2080 Ti GPU 1: GeForce RTX 2080 Ti

Nvidia driver version: 440.33.01 cuDNN version: /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7 HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.18.5 [pip3] numpydoc==1.1.0 [pip3] torch==1.5.1+cu101 [pip3] torchtext==0.7.0 [pip3] torchvision==0.6.1+cu101 [conda] blas 1.0 mkl
[conda] mkl 2020.1 217
[conda] mkl-service 2.3.0 py36he904b0f_0
[conda] mkl_fft 1.1.0 py36h23d657b_0
[conda] mkl_random 1.1.1 py36h0573a6f_0
[conda] numpy 1.18.5 py36ha1c710e_0
[conda] numpy-base 1.18.5 py36hde5b4d6_0
[conda] numpydoc 1.1.0 py_0
[conda] torch 1.5.1+cu101 pypi_0 pypi [conda] torchtext 0.7.0 pypi_0 pypi [conda] torchvision 0.6.1+cu101 pypi_0 pypi

Additional context

Issue Analytics

State:
Created 3 years ago
Comments:6 (1 by maintainers)

Top GitHub Comments

5reactions

vfdev-5commented, Mar 4, 2021

For those who can not install torchvision master with the fix, you can try the following workaround = download and preprocess the dataset manually:

go to MNIST folder
create 2 scripts there

download.sh


# sudo apt-get update && apt-get install -y wget p7zip-full

mkdir -p raw
mkdir -p processed

cd raw

wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz

7z x train-images-idx3-ubyte.gz
7z x train-labels-idx1-ubyte.gz
7z x t10k-images-idx3-ubyte.gz
7z x t10k-labels-idx1-ubyte.gz

cd ..

python process.py

process.py

import os
import torch
from torchvision.datasets.mnist import read_image_file, read_label_file

raw_folder = "raw"
processed_folder = "processed"
training_file = 'training.pt'
test_file = 'test.pt'

### Code from https://github.com/pytorch/vision/blob/7d4154735f421b254c408c16e0980b1ca0dd9b8e/torchvision/datasets/mnist.py#L134
# process and save as torch files
print('Processing...')

training_set = (
    read_image_file(os.path.join(raw_folder, 'train-images.idx3-ubyte')),
    read_label_file(os.path.join(raw_folder, 'train-labels.idx1-ubyte'))
)
test_set = (
    read_image_file(os.path.join(raw_folder, 't10k-images.idx3-ubyte')),
    read_label_file(os.path.join(raw_folder, 't10k-labels.idx1-ubyte'))
)
with open(os.path.join(processed_folder, training_file), 'wb') as f:
    torch.save(training_set, f)
with open(os.path.join(processed_folder, test_file), 'wb') as f:
    torch.save(test_set, f)

print('Done!')

make sure to have installed : wget, 7z, torchvision and torch
run sh download.sh, it should download and preprocess the dataset
use torchvision’s MNIST code as usual

HTH

0reactions

liqing9399commented, Mar 21, 2021

RuntimeError: shape ‘[60000, 28, 28]’ is invalid for input of size 4482028 When I used the above process.py script, I reported this error. What was the reason?