question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

datasets doesn't support # in data paths

See original GitHub issue

Describe the bug

dataset files with # symbol their paths aren’t read correctly.

Steps to reproduce the bug

The data in folder c#of this dataset can’t be loaded. While the folder c_sharp with the same data is loaded properly

ds = load_dataset('loubnabnl/bigcode_csharp', split="train", data_files=["data/c#/*"])
FileNotFoundError: Couldn't find file at https://huggingface.co/datasets/loubnabnl/bigcode_csharp/resolve/27a3166cff4bb18e11919cafa6f169c0f57483de/data/c#/data_0003.jsonl

Environment info

  • datasets version: 2.5.2
  • Platform: macOS-12.2.1-arm64-arm-64bit
  • Python version: 3.9.13
  • PyArrow version: 9.0.0
  • Pandas version: 1.4.3

cc @lhoestq

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:9 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
lhoestqcommented, Oct 12, 2022

repo_id can only contain alphanumeric characters and _- so it doesn’t need to be encoded.

However I agree it’s a good idea to also apply quote to the revision as well as in 2. !

1reaction
lhoestqcommented, Oct 12, 2022

Should be fixed by https://github.com/huggingface/datasets/issues/5099 - we’ll do a release later today

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dataset <value> does not exist or is not supported.—ArcGIS Pro
An invalid subtype on the dataset. To fix this, go to the feature class properties, click the Subtypes tab, and reenter the default...
Read more >
Path in dataset or folder does not exist - Dataiku Documentation
The specified path in the dataset or folder does not exist. This can happen if a partition is required but does not exist....
Read more >
Unable to add a raster dataset in ArcMap or ArcGIS Pro
Adding a raster dataset in ArcMap fails and an error message is returned. Warning: Could not add the specified data object to the...
Read more >
Azure Data Factory - source dataset fails with "path does not ...
Probably your expression in directory and file textbox inside the dataset is not correct. Check this link : Azure data flow not showing...
Read more >
Dataset Guidelines for Forecast - AWS Documentation
When you import a dataset, you can specify either the path to a CSV or ... your Amazon Simple Storage Service (Amazon S3)...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found