Unable to download IWSLT datasets
See original GitHub issue🐛 Bug
Describe the bug Unable to download IWSLT2016 or IWSLT2017 datasets.
To Reproduce Steps to reproduce the behavior:
from torchtext.datasets import IWSLT2016
train, valid, test = IWSLT2016()
src, tgt = next(iter(train))
The same error occurs when trying to use IWSLT2017
.
Expected behavior The program returns the next src, tgt
pair in the training data.
Screenshots Full error logs are in this gist.
Environment Included in gist above.
Additional context No additional context.
Issue Analytics
- State:
- Created a year ago
- Comments:18 (12 by maintainers)
Top Results From Across the Web
Can't download IWSLT dataset to Google Colab #1098 - GitHub
This is the implementation. And I am using the Google Colab to be able to use the GPU. But the code for downloading...
Read more >Unable to download IWSLT '14 datasets - Google Groups
Getting Started Tutorial on training a new model, but I was unable to download the IWSLT. dataset when running bash prepare-iwslt14.sh.
Read more >Offline Speech Translation - IWSLT
The dataset is available here. Press the bottom ”click here to download the corpus”, and select version V2. IMPORTANT NOTE: the 2021 test...
Read more >torchnlp.datasets.iwslt — PyTorch-NLP 0.5.0 documentation
Source code for torchnlp.datasets.iwslt. import os import xml.etree.ElementTree as ElementTree import io import glob from torchnlp.download import ...
Read more >Datasets | TBD
Datasets · ImageNet1K · International Workshop on Spoken Language Translation (IWSLT) · Dataset for BERT · Workshop on Statistical Machine Translation (WMT) ·...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
As a temporary fix, I’m just downloading the datasets manually via the links in the documentation:
IWSLT2016
IWSLT2017
Then you can put the downloaded
.tgz
file into the proper directory:~/.torchtext/cache/IWSLT2016/
for 2016 and similar for 2017.Then
torchtext
will recognize the files and not download from GDrive.@Nayef211 thanks, it does sound like exactly what I’m observing with IWSLT.
But I tried what is suggested in #1735 with (note the order of end_caching here and in the original code):
I still get the same behaviour: the inner
load_from_tar()
never gets iterated over.