question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pubmedqa task data fails to download

See original GitHub issue

using lm-eval==0.2.0:

python ./tasks/eval_harness/download.py --task_list pubmedqa
Downloading and preparing dataset pubmed_qa/pqa_labeled (download: 656.02 MiB, generated: 1.99 MiB, post-processed: Unknown size, total: 658.01 MiB) to /gpfswork/rech/six/commun/datasets/pubmed_qa/pqa_labeled/1.0.0/2e65addecca4197502cd10ab8ef1919a47c28672f62d7abac7cc9afdcf24fb2d...
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:00<00:00, 4495.50it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:00<00:00, 671.73it/s]
Traceback (most recent call last):
  File "./tasks/eval_harness/download.py", line 20, in <module>
    main()
  File "./tasks/eval_harness/download.py", line 17, in main
    tasks.get_task_dict(task_list)
  File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/tasks/__init__.py", line 325, in get_task_dict
    task_name_dict = {
  File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/tasks/__init__.py", line 326, in <dictcomp>
    task_name: get_task(task_name)()
  File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/tasks/common.py", line 11, in __init__
    super().__init__()
  File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/base.py", line 350, in __init__
    self.download()
  File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/tasks/common.py", line 14, in download
    self.data = datasets.load_dataset(path=self.DATASET_PATH, name=self.DATASET_NAME)
  File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/datasets/load.py", line 1632, in load_dataset
    builder_instance.download_and_prepare(
  File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/datasets/builder.py", line 607, in download_and_prepare
    self._download_and_prepare(
  File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/datasets/builder.py", line 679, in _download_and_prepare
    verify_checksums(
  File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/datasets/utils/info_utils.py", line 40, in verify_checksums
    raise NonMatchingChecksumError(error_msg + str(bad_urls))
datasets.utils.info_utils.NonMatchingChecksumError: Checksums didn't match for dataset source files:
['https://drive.google.com/uc?export=download&id=1RsGLINVce-0GsDkCLDuLZmoLuzfmoCuQ', 'https://drive.google.com/uc?export=download&id=15v1x6aQDlZymaHGP7cZJZZYFfeJt2NdS']

probably those files got updated and require new checksums perhaps?

I checked the files are there.

Thanks.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:21 (10 by maintainers)

github_iconTop GitHub Comments

2reactions
stas00commented, Apr 28, 2022

so one can download these manually via gdown or browser, but not via datasets - apparently due to reaching some traffic limit.

we will have to move the data to a different storage that doesn’t have such caps.

In any case the issue is not of lm-eval.

I will close it once we get that fixed on the hub.

1reaction
jon-towcommented, May 11, 2022

Glad to hear it worked out! Thanks for reporting everything πŸ˜ƒ

Read more comments on GitHub >

github_iconTop Results From Across the Web

pubmed_qa Β· Datasets at Hugging Face
pubid (int32) question (string) final_decision (string) 25,445,628 "Are ?" "yes" 25,429,809 "Do ?" "yes" 25,424,667 "Do ?" "yes"
Read more >
PubMedQA Homepage
The task of PubMedQA is to answer research questions with yes/no/maybe (e.g.: Do ... Please visit our GitHub repository to download the dataset:....
Read more >
PubMedQA: A Dataset for Biomedical Research ... - GitHub
Failed to load latest commit information.
Read more >
PubMedQA Dataset - Papers With Code
The task of PubMedQA is to answer research questions with yes/no/maybe ... BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing.
Read more >
PubMedQA: A Dataset for Biomedical Research Question ...
PDF | We introduce PubMedQA, a novel biomedical question answering (QA) dataset collected from PubMed abstracts. The task of PubMedQA is to answer....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found