question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

load_dataset possibly broken for gated datasets?

See original GitHub issue

Describe the bug

When trying to download the winoground dataset, I get this error unless I roll back the version of huggingface-hub:

[/usr/local/lib/python3.7/dist-packages/huggingface_hub/utils/_validators.py](https://localhost:8080/#) in validate_repo_id(repo_id)
    165     if repo_id.count("/") > 1:
    166         raise HFValidationError(
--> 167             "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
    168             f" '{repo_id}'. Use `repo_type` argument if needed."
    169         )

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'datasets/facebook/winoground'. Use `repo_type` argument if needed

Steps to reproduce the bug

Install requirements:

pip install transformers
pip install datasets
# It works if you uncomment the following line, rolling back huggingface hub:
# pip install huggingface-hub==0.10.1

Then:

from datasets import load_dataset
auth_token = ""  # Replace with an auth token, which you can get from your huggingface account: Profile -> Settings -> Access Tokens -> New Token
winoground = load_dataset("facebook/winoground", use_auth_token=auth_token)["test"]

Expected behavior

Downloading of the datset

Environment info

Just a google colab; see here: https://colab.research.google.com/drive/15wwOSte2CjTazdnCWYUm2VPlFbk2NGc0?usp=sharing

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Reactions:1
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
TristanThrushcommented, Nov 22, 2022

Btw, thanks very much for finding the hub rollback temporary fix and bringing the issue to our attention @KhoomeiK!

0reactions
TristanThrushcommented, Nov 28, 2022

Awesome, big thanks to both @xiaohk and @mariosasko!

Read more comments on GitHub >

github_iconTop Results From Across the Web

What if my dataset isn't on the Hub? - Hugging Face
In this section we'll show you how Datasets can be used to load datasets that aren't available on the Hugging Face Hub.
Read more >
Exploratory data analysis with lois | Kaggle
Explore and run machine learning code with Kaggle Notebooks | Using data from ... user can result in broken permissions and conflicting behaviour...
Read more >
CumulusCI Documentation
A task can perform a deployment, load a dataset, retrieve data from an org, install a managed package, or do many other things....
Read more >
A Practical Guide to Hybrid Natural Language ... - Springer Link
from theory to code as painlessly as possible through experiments and exercises on ... 10.3.5 Load Dataset into a Pandas DataFrame .
Read more >
Deep Neural Networks for Appliance Transient Classification
It is noteworthy that in two cases, data were upsampled to have a higher frequency than the original dataset [36,112]. Results on the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found