Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MNIST dataset no longer downloads (repeat of old fixed March 2020 problem)

See original GitHub issue

🐛 Bug

This seems to be a recurrence of an issue spotted in #1938 which was fixed back in March 2020 and then closed, but has now reappeared. There are a number of people in #1938 reporting that the issue appeared somewhere in the last 12 hours.

To Reproduce

Steps to reproduce the behavior:

Open a fresh google colab
Try something like the following:

import torchvision
from torchvision import datasets, transforms
transform = transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5,), (0.5,)),
                              ])
trainset = datasets.MNIST('PATH_TO_STORE_TRAINSET', download=True, train=True, transform=transform)

Results in:

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to PATH_TO_STORE_TRAINSET/MNIST/raw/train-images-idx3-ubyte.gz
0/? [00:00<?, ?it/s]
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-16-492e382ce34e> in <module>()
      4                               transforms.Normalize((0.5,), (0.5,)),
      5                               ])
----> 6 trainset = datasets.MNIST('PATH_TO_STORE_TRAINSET', download=True, train=True, transform=transform)

11 frames
/usr/local/lib/python3.7/dist-packages/torchvision/datasets/mnist.py in __init__(self, root, train, transform, target_transform, download)
     77 
     78         if download:
---> 79             self.download()
     80 
     81         if not self._check_exists():

/usr/local/lib/python3.7/dist-packages/torchvision/datasets/mnist.py in download(self)
    144         for url, md5 in self.resources:
    145             filename = url.rpartition('/')[2]
--> 146             download_and_extract_archive(url, download_root=self.raw_folder, filename=filename, md5=md5)
    147 
    148         # process and save as torch files

/usr/local/lib/python3.7/dist-packages/torchvision/datasets/utils.py in download_and_extract_archive(url, download_root, extract_root, filename, md5, remove_finished)
    254         filename = os.path.basename(url)
    255 
--> 256     download_url(url, download_root, filename, md5)
    257 
    258     archive = os.path.join(download_root, filename)

/usr/local/lib/python3.7/dist-packages/torchvision/datasets/utils.py in download_url(url, root, filename, md5)
     82                 )
     83             else:
---> 84                 raise e
     85         # check integrity of downloaded file
     86         if not check_integrity(fpath, md5):

/usr/local/lib/python3.7/dist-packages/torchvision/datasets/utils.py in download_url(url, root, filename, md5)
     70             urllib.request.urlretrieve(
     71                 url, fpath,
---> 72                 reporthook=gen_bar_updater()
     73             )
     74         except (urllib.error.URLError, IOError) as e:  # type: ignore[attr-defined]

/usr/lib/python3.7/urllib/request.py in urlretrieve(url, filename, reporthook, data)
    245     url_type, path = splittype(url)
    246 
--> 247     with contextlib.closing(urlopen(url, data)) as fp:
    248         headers = fp.info()
    249 

/usr/lib/python3.7/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

/usr/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

/usr/lib/python3.7/urllib/request.py in http_response(self, request, response)
    639         if not (200 <= code < 300):
    640             response = self.parent.error(
--> 641                 'http', request, response, code, msg, hdrs)
    642 
    643         return response

/usr/lib/python3.7/urllib/request.py in error(self, proto, *args)
    567         if http_err:
    568             args = (dict, 'default', 'http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that knows when it makes

/usr/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

/usr/lib/python3.7/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 403: Forbidden

Expected behavior

The dataset should just be loaded (as indeed it was this morning).

Environment

Collecting environment information...
PyTorch version: 1.7.1+cu101
Is debug build: False
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.12.0

Python version: 3.7 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: 11.0.221
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.4
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.7.1+cu101
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.3.1
[pip3] torchvision==0.8.2+cu101
[conda] Could not collect```



cc @pmeier

Issue Analytics

State:
Created 3 years ago
Reactions:6
Comments:10 (1 by maintainers)

Top GitHub Comments

12reactions

andresgtncommented, Mar 3, 2021

copy this snippet at the top of your notebook, run it, and then just load your datasets as usual…

from six.moves import urllib opener = urllib.request.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] urllib.request.install_opener(opener)

2reactions

pmeiercommented, Mar 3, 2021

We have decided to do the user agent fix anyway. It will also make its way in the upcoming release (#3499).

Top Results From Across the Web

How to Develop a CNN for MNIST Handwritten Digit ...

The MNIST handwritten digit classification problem is a standard dataset used in computer vision and deep learning. Although the dataset is ...

Cannot load MNIST Original dataset using fetch_openml in ...

Method fetch_openml() download dataset from mldata.org which is not stable and can not connect. An alternative way is manually to download ...

Errata - O'Reilly Media

Version Location Submitted By Safari Books Online ? Section: Computing Gradients Using Autodiff Thierry Herrmann ePub Page ch. 14 TensorFlow Implementation Mohammed El‑Beltagy Mobi Page ch....

PyTorch Datasets and DataLoaders for deep Learning

Class imbalance is a common problem, but in our case, we have just seen that the Fashion-MNIST dataset is indeed balanced, so we...

Quantization and Deployment of Deep Neural Networks on ...

By doing so, data do not need to be sent by the device to the cloud anymore. ... of more than 2, without...