question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot download IWSLT dataset

See original GitHub issue

Running

train_data, valid_data, test_data = IWSLT.splits(exts = ('.de', '.en'), fields = (SRC, TRG))

on Google Colab leads to this error:

TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

NewConnectionError                        Traceback (most recent call last)
NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f9467921c18>: Failed to establish a new connection: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

MaxRetryError                             Traceback (most recent call last)
MaxRetryError: HTTPSConnectionPool(host='wit3.fbk.eu', port=443): Max retries exceeded with url: /archive/2016-01//texts/de/en/de-en.tgz (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f9467921c18>: Failed to establish a new connection: [Errno 110] Connection timed out',))

During handling of the above exception, another exception occurred:

ConnectionError                           Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    514                 raise SSLError(e, request=request)
    515 
--> 516             raise ConnectionError(e, request=request)
    517 
    518         except ClosedPoolError as e:

ConnectionError: HTTPSConnectionPool(host='wit3.fbk.eu', port=443): Max retries exceeded with url: /archive/2016-01//texts/de/en/de-en.tgz (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f9467921c18>: Failed to establish a new connection: [Errno 110] Connection timed out',))

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:13 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
garyhlaicommented, Dec 25, 2020

@ghlai9665 thanks for the new url. Can you submit a PR and update the IWSLT dataset url with the new one? The new link will download all the files and we will need to fetch specific languages from the local. Let me know if you need any help there.

I used download_from_url func and it works:

url = 'https://drive.google.com/uc?id=1l5y6Giag9aRPwGtuZHswh3w5v3qEz8D8'
torchtext.utils.download_from_url(url)

Remember to add a CI test in test/data/test_builtin_datasets.py (similar to test_multi30k). We have an experimental IWSLT dataset (here).

Just saw this. Working on it!

0reactions
zhangguanheng66commented, Dec 29, 2020
Read more comments on GitHub >

github_iconTop Results From Across the Web

Unable to download IWSLT '14 datasets - Google Groups
Getting Started Tutorial on training a new model, but I was unable to download the IWSLT. dataset when running bash prepare-iwslt14.sh.
Read more >
Developers - Cannot download IWSLT dataset - - Bountysource
Running. train_data, valid_data, test_data = IWSLT.splits(exts = ('.de', '.en'), fields = (SRC, TRG)). on Google Colab leads to this error:
Read more >
iwslt2017 · Datasets at Hugging Face
The IWSLT 2017 Multilingual Task addresses text translation, including zero-shot translation, with a single MT system across all directions including English, ...
Read more >
torchtext.datasets — torchtext 0.4.0 documentation
Field(sequential=False) # make splits for data train, test = datasets.IMDB.splits(TEXT, LABEL) ... Create dataset objects for splits of the IWSLT dataset.
Read more >
IWSLT'15 English-Vietnamese Dataset - NLP Hub - Metatext
Upload dataset, train and deploy machine learning models in minutes. ... Here you can download the IWSLT'15 English-Vietnamese dataset in Text format.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found