Translation datasets not automatically downloading
See original GitHub issueCode:
from torchtext.data import Field
from torchtext.datasets import Multi30k
DE = Field(init_token='<sos>', eos_token='<eos>')
EN = Field(init_token='<sos>', eos_token='<eos>')
train, val, test = Multi30k.splits(exts=('.de', '.en'), fields=(DE, EN))
Error:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-3-637d49b65435> in <module>()
----> 1 train, val, test = Multi30k.splits(exts=('.de', '.en'), fields=(DE, EN))
~/miniconda3/envs/pytorch/lib/python3.6/site-packages/torchtext/datasets/translation.py in splits(cls, exts, fields, root, train, validation, test, **kwargs)
99 """
100 return super(Multi30k, cls).splits(
--> 101 exts, fields, root, train, validation, test, **kwargs)
102
103
~/miniconda3/envs/pytorch/lib/python3.6/site-packages/torchtext/datasets/translation.py in splits(cls, exts, fields, path, root, train, validation, test, **kwargs)
62
63 train_data = None if train is None else cls(
---> 64 os.path.join(path, train), exts, fields, **kwargs)
65 val_data = None if validation is None else cls(
66 os.path.join(path, validation), exts, fields, **kwargs)
~/miniconda3/envs/pytorch/lib/python3.6/site-packages/torchtext/datasets/translation.py in __init__(self, path, exts, fields, **kwargs)
31
32 examples = []
---> 33 with open(src_path) as src_file, open(trg_path) as trg_file:
34 for src_line, trg_line in zip(src_file, trg_file):
35 src_line, trg_line = src_line.strip(), trg_line.strip()
FileNotFoundError: [Errno 2] No such file or directory: '.data/val.de'
It just doesn’t seem to automatically download the data for both the Multi30k and WMT14 datasets.
PyTorch version: 0.3.1 TorchText version 0.2.3
EDIT
I have downgraded my TorchText to version 0.2.1 and I do not get the error, had a quick look at the commits between 0.2.1 and 0.2.3 and couldn’t figure out which commit introduced the break.
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (3 by maintainers)
Top Results From Across the Web
Translation datasets not automatically downloading -
I have downgraded my TorchText to version 0.2.1 and I do not get the error, had a quick look at the commits between...
Read more >msr_zhen_translation_parity · Datasets at Hugging Face
This dataset contains 6 extra English translations to Chinese-English language pair of WMT17. Dataset Structure. Data Instances.
Read more >wmt19_translate | TensorFlow Datasets
Translate dataset based on the data from statmt.org. ... Some of the wmt configs here, require a manual download. ... Auto-cached (documentation): No....
Read more >10.5. Machine Translation and the Dataset
Downloading and Preprocessing the Dataset¶. To begin, we download an English-French dataset that consists of bilingual sentence pairs from the Tatoeba Project.
Read more >How to Prepare a French-to-English Dataset for Machine ...
This tutorial is divided into 5 parts; they are: Europarl Machine Translation Dataset; Download French-English Dataset; Load Dataset; Clean ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@mttk i figured out that i had to add the ‘root’ argument in the split function. so i modified the line of code to train_data , test_data = datasets.IMDB.splits(TEXT, LABEL, root = ‘data’) #the data will be downloaded in the root dir and then the data got downloaded in the specified root directory. thnaks anyways 😄
I got around this quite easily by downloading with Multi30k.download(DATAROOT) and then just using TranslationDataset.splits instead of Multi30k.splits. Pass the rootpath to the path argument instead of the root argument