Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pubmedqa task data fails to download

See original GitHub issue

using lm-eval==0.2.0:

python ./tasks/eval_harness/download.py --task_list pubmedqa
Downloading and preparing dataset pubmed_qa/pqa_labeled (download: 656.02 MiB, generated: 1.99 MiB, post-processed: Unknown size, total: 658.01 MiB) to /gpfswork/rech/six/commun/datasets/pubmed_qa/pqa_labeled/1.0.0/2e65addecca4197502cd10ab8ef1919a47c28672f62d7abac7cc9afdcf24fb2d...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 4495.50it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 671.73it/s]
Traceback (most recent call last):
  File "./tasks/eval_harness/download.py", line 20, in <module>
    main()
  File "./tasks/eval_harness/download.py", line 17, in main
    tasks.get_task_dict(task_list)
  File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/tasks/__init__.py", line 325, in get_task_dict
    task_name_dict = {
  File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/tasks/__init__.py", line 326, in <dictcomp>
    task_name: get_task(task_name)()
  File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/tasks/common.py", line 11, in __init__
    super().__init__()
  File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/base.py", line 350, in __init__
    self.download()
  File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/tasks/common.py", line 14, in download
    self.data = datasets.load_dataset(path=self.DATASET_PATH, name=self.DATASET_NAME)
  File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/datasets/load.py", line 1632, in load_dataset
    builder_instance.download_and_prepare(
  File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/datasets/builder.py", line 607, in download_and_prepare
    self._download_and_prepare(
  File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/datasets/builder.py", line 679, in _download_and_prepare
    verify_checksums(
  File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/datasets/utils/info_utils.py", line 40, in verify_checksums
    raise NonMatchingChecksumError(error_msg + str(bad_urls))
datasets.utils.info_utils.NonMatchingChecksumError: Checksums didn't match for dataset source files:
['https://drive.google.com/uc?export=download&id=1RsGLINVce-0GsDkCLDuLZmoLuzfmoCuQ', 'https://drive.google.com/uc?export=download&id=15v1x6aQDlZymaHGP7cZJZZYFfeJt2NdS']

probably those files got updated and require new checksums perhaps?

I checked the files are there.

Thanks.

Issue Analytics

State:
Created a year ago
Comments:21 (10 by maintainers)

Top GitHub Comments

2reactions

stas00commented, Apr 28, 2022

so one can download these manually via gdown or browser, but not via datasets - apparently due to reaching some traffic limit.

we will have to move the data to a different storage that doesn’t have such caps.

In any case the issue is not of lm-eval.

I will close it once we get that fixed on the hub.

1reaction

jon-towcommented, May 11, 2022

Glad to hear it worked out! Thanks for reporting everything 😃

Top Results From Across the Web

pubmed_qa · Datasets at Hugging Face

pubid (int32) question (string) final_decision (string) 25,445,628 "Are ?" "yes" 25,429,809 "Do ?" "yes" 25,424,667 "Do ?" "yes"

PubMedQA Homepage

The task of PubMedQA is to answer research questions with yes/no/maybe (e.g.: Do ... Please visit our GitHub repository to download the dataset:....

PubMedQA: A Dataset for Biomedical Research ... - GitHub

Failed to load latest commit information.

PubMedQA Dataset - Papers With Code

The task of PubMedQA is to answer research questions with yes/no/maybe ... BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing.

PubMedQA: A Dataset for Biomedical Research Question ...

PDF | We introduce PubMedQA, a novel biomedical question answering (QA) dataset collected from PubMed abstracts. The task of PubMedQA is to answer....