pubmedqa task data fails to download
See original GitHub issueusing lm-eval==0.2.0
:
python ./tasks/eval_harness/download.py --task_list pubmedqa
Downloading and preparing dataset pubmed_qa/pqa_labeled (download: 656.02 MiB, generated: 1.99 MiB, post-processed: Unknown size, total: 658.01 MiB) to /gpfswork/rech/six/commun/datasets/pubmed_qa/pqa_labeled/1.0.0/2e65addecca4197502cd10ab8ef1919a47c28672f62d7abac7cc9afdcf24fb2d...
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3/3 [00:00<00:00, 4495.50it/s]
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3/3 [00:00<00:00, 671.73it/s]
Traceback (most recent call last):
File "./tasks/eval_harness/download.py", line 20, in <module>
main()
File "./tasks/eval_harness/download.py", line 17, in main
tasks.get_task_dict(task_list)
File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/tasks/__init__.py", line 325, in get_task_dict
task_name_dict = {
File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/tasks/__init__.py", line 326, in <dictcomp>
task_name: get_task(task_name)()
File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/tasks/common.py", line 11, in __init__
super().__init__()
File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/base.py", line 350, in __init__
self.download()
File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/lm_eval/tasks/common.py", line 14, in download
self.data = datasets.load_dataset(path=self.DATASET_PATH, name=self.DATASET_NAME)
File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/datasets/load.py", line 1632, in load_dataset
builder_instance.download_and_prepare(
File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/datasets/builder.py", line 607, in download_and_prepare
self._download_and_prepare(
File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/datasets/builder.py", line 679, in _download_and_prepare
verify_checksums(
File "/gpfswork/rech/six/commun/conda/py38-pt111/lib/python3.8/site-packages/datasets/utils/info_utils.py", line 40, in verify_checksums
raise NonMatchingChecksumError(error_msg + str(bad_urls))
datasets.utils.info_utils.NonMatchingChecksumError: Checksums didn't match for dataset source files:
['https://drive.google.com/uc?export=download&id=1RsGLINVce-0GsDkCLDuLZmoLuzfmoCuQ', 'https://drive.google.com/uc?export=download&id=15v1x6aQDlZymaHGP7cZJZZYFfeJt2NdS']
probably those files got updated and require new checksums perhaps?
I checked the files are there.
Thanks.
Issue Analytics
- State:
- Created a year ago
- Comments:21 (10 by maintainers)
Top Results From Across the Web
pubmed_qa Β· Datasets at Hugging Face
pubid (int32) question (string) final_decision (string)
25,445,628 "Are ?" "yes"
25,429,809 "Do ?" "yes"
25,424,667 "Do ?" "yes"
Read more >PubMedQA Homepage
The task of PubMedQA is to answer research questions with yes/no/maybe (e.g.: Do ... Please visit our GitHub repository to download the dataset:....
Read more >PubMedQA: A Dataset for Biomedical Research ... - GitHub
Failed to load latest commit information.
Read more >PubMedQA Dataset - Papers With Code
The task of PubMedQA is to answer research questions with yes/no/maybe ... BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing.
Read more >PubMedQA: A Dataset for Biomedical Research Question ...
PDF | We introduce PubMedQA, a novel biomedical question answering (QA) dataset collected from PubMed abstracts. The task of PubMedQA is to answer....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
so one can download these manually via
gdown
or browser, but not via datasets - apparently due to reaching some traffic limit.we will have to move the data to a different storage that doesnβt have such caps.
In any case the issue is not of
lm-eval
.I will close it once we get that fixed on the hub.
Glad to hear it worked out! Thanks for reporting everything π