datasets doesn't support # in data paths
See original GitHub issueDescribe the bug
dataset files with #
symbol their paths aren’t read correctly.
Steps to reproduce the bug
The data in folder c#
of this dataset can’t be loaded. While the folder c_sharp
with the same data is loaded properly
ds = load_dataset('loubnabnl/bigcode_csharp', split="train", data_files=["data/c#/*"])
FileNotFoundError: Couldn't find file at https://huggingface.co/datasets/loubnabnl/bigcode_csharp/resolve/27a3166cff4bb18e11919cafa6f169c0f57483de/data/c#/data_0003.jsonl
Environment info
datasets
version: 2.5.2- Platform: macOS-12.2.1-arm64-arm-64bit
- Python version: 3.9.13
- PyArrow version: 9.0.0
- Pandas version: 1.4.3
cc @lhoestq
Issue Analytics
- State:
- Created a year ago
- Comments:9 (8 by maintainers)
Top Results From Across the Web
Dataset <value> does not exist or is not supported.—ArcGIS Pro
An invalid subtype on the dataset. To fix this, go to the feature class properties, click the Subtypes tab, and reenter the default...
Read more >Path in dataset or folder does not exist - Dataiku Documentation
The specified path in the dataset or folder does not exist. This can happen if a partition is required but does not exist....
Read more >Unable to add a raster dataset in ArcMap or ArcGIS Pro
Adding a raster dataset in ArcMap fails and an error message is returned. Warning: Could not add the specified data object to the...
Read more >Azure Data Factory - source dataset fails with "path does not ...
Probably your expression in directory and file textbox inside the dataset is not correct. Check this link : Azure data flow not showing...
Read more >Dataset Guidelines for Forecast - AWS Documentation
When you import a dataset, you can specify either the path to a CSV or ... your Amazon Simple Storage Service (Amazon S3)...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
repo_id can only contain alphanumeric characters and _- so it doesn’t need to be encoded.
However I agree it’s a good idea to also apply
quote
to the revision as well as in 2. !Should be fixed by https://github.com/huggingface/datasets/issues/5099 - we’ll do a release later today