load_dataset possibly broken for gated datasets?
See original GitHub issueDescribe the bug
When trying to download the winoground dataset, I get this error unless I roll back the version of huggingface-hub:
[/usr/local/lib/python3.7/dist-packages/huggingface_hub/utils/_validators.py](https://localhost:8080/#) in validate_repo_id(repo_id)
165 if repo_id.count("/") > 1:
166 raise HFValidationError(
--> 167 "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
168 f" '{repo_id}'. Use `repo_type` argument if needed."
169 )
HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'datasets/facebook/winoground'. Use `repo_type` argument if needed
Steps to reproduce the bug
Install requirements:
pip install transformers
pip install datasets
# It works if you uncomment the following line, rolling back huggingface hub:
# pip install huggingface-hub==0.10.1
Then:
from datasets import load_dataset
auth_token = "" # Replace with an auth token, which you can get from your huggingface account: Profile -> Settings -> Access Tokens -> New Token
winoground = load_dataset("facebook/winoground", use_auth_token=auth_token)["test"]
Expected behavior
Downloading of the datset
Environment info
Just a google colab; see here: https://colab.research.google.com/drive/15wwOSte2CjTazdnCWYUm2VPlFbk2NGc0?usp=sharing
Issue Analytics
- State:
- Created 10 months ago
- Reactions:1
- Comments:6 (3 by maintainers)
Top Results From Across the Web
What if my dataset isn't on the Hub? - Hugging Face
In this section we'll show you how Datasets can be used to load datasets that aren't available on the Hugging Face Hub.
Read more >Exploratory data analysis with lois | Kaggle
Explore and run machine learning code with Kaggle Notebooks | Using data from ... user can result in broken permissions and conflicting behaviour...
Read more >CumulusCI Documentation
A task can perform a deployment, load a dataset, retrieve data from an org, install a managed package, or do many other things....
Read more >A Practical Guide to Hybrid Natural Language ... - Springer Link
from theory to code as painlessly as possible through experiments and exercises on ... 10.3.5 Load Dataset into a Pandas DataFrame .
Read more >Deep Neural Networks for Appliance Transient Classification
It is noteworthy that in two cases, data were upsampled to have a higher frequency than the original dataset [36,112]. Results on the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Btw, thanks very much for finding the hub rollback temporary fix and bringing the issue to our attention @KhoomeiK!
Awesome, big thanks to both @xiaohk and @mariosasko!