question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`_resolve_features` ignores the token

See original GitHub issue

Describe the bug

When calling _resolve_features() on a gated dataset, ie. a dataset which requires a token to be loaded, the token seems to be ignored even if it has been provided to load_dataset before.

Steps to reproduce the bug

import os

os.environ["HF_ENDPOINT"] = "https://hub-ci.huggingface.co/"
hf_token = "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"

from datasets import load_dataset

# public
dataset_name = "__DUMMY_DATASETS_SERVER_USER__/repo_csv_data-16612654226756"
config_name = "__DUMMY_DATASETS_SERVER_USER__--repo_csv_data-16612654226756"
split_name = "train"

iterable_dataset = load_dataset(
    dataset_name,
    name=config_name,
    split=split_name,
    streaming=True,
    use_auth_token=hf_token,
)
iterable_dataset = iterable_dataset._resolve_features()
print(iterable_dataset.features)

# gated
dataset_name = "__DUMMY_DATASETS_SERVER_USER__/repo_csv_data-16612654317644"
config_name = "__DUMMY_DATASETS_SERVER_USER__--repo_csv_data-16612654317644"
split_name = "train"


iterable_dataset = load_dataset(
    dataset_name,
    name=config_name,
    split=split_name,
    streaming=True,
    use_auth_token=hf_token,
)
try:
    iterable_dataset = iterable_dataset._resolve_features()
except FileNotFoundError as e:
    print("FAILS")

Expected results

I expect to have the same result on a public dataset and on a gated (or private) dataset, if the token has been provided.

Actual results

An exception is thrown on gated datasets.

Environment info

  • datasets version: 2.4.0
  • Platform: Linux-5.15.0-1017-aws-x86_64-with-glibc2.35
  • Python version: 3.9.6
  • PyArrow version: 7.0.0
  • Pandas version: 1.4.2

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
lhoestqcommented, Oct 17, 2022

yes definitely !

1reaction
lhoestqcommented, Oct 17, 2022

Yes exactly, this is a known bug

Read more comments on GitHub >

github_iconTop Results From Across the Web

Token in Provider isn't properly resolved #898 - GitHub
Successfully merging a pull request may close this issue. fix(lib): keysToSnakeCase needs to ignore intrinsic tokens hashicorp/terraform-cdk.
Read more >
Sourcetree ignores github token and throws 403 error
Solved: I have generated personal token on github and entered it in password field successfully, but when I try to fetch from my...
Read more >
Question about revealing, canceling and ignoring tokens
Question about revealing, canceling and ignoring tokens ... "Reveal" - rather than the more appropriate word you just used, "Resolve".
Read more >
Ignore every token, but the ones in rules - bison - Stack Overflow
So, I'm trying to make a simple C parser in flex/bison, I only need to parse function and variable declarations, and its uses....
Read more >
Circle-token param ignored when using API URL to fetch ...
I'm voting +1 for this feature, since I'm generating git diffs as artifacts and making the endpoint accessible via token will allow me...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found