question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dataset Viewer issue for darragh/demo_data_raw3

See original GitHub issue

Link

https://huggingface.co/datasets/darragh/demo_data_raw3

Description

Exception:     ValueError
Message:       Arrow type extension<arrow.py_extension_type<pyarrow.lib.UnknownExtensionType>> does not have a datasets dtype equivalent.

reported by @NielsRogge

Owner

No

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
severocommented, Aug 12, 2022

do you have an idea of why it can occur @huggingface/datasets? The dataset consists of a single parquet file.

0reactions
albertvillanovacommented, Aug 13, 2022

Apparently, there is something weird with that Parquet file: its schema is:

images: extension<arrow.py_extension_type<pyarrow.lib.UnknownExtensionType>>

I have forced a right schema:

from datasets import Features, Image, load_dataset

features = Features({"images": Image()})
ds = datasets.load_dataset("parquet", split="train", data_files="train-00000-of-00001.parquet", features=features)

and then recreated a new Parquet file:

ds.to_parquet("train.parquet")

Now this Parquet file has the right schema:

images: struct<bytes: binary, path: string>
  child 0, bytes: binary
  child 1, path: string

and can be loaded normally:

In [26]: ds = load_dataset("parquet", split="train", data_files="dataset.parquet")
n [27]: ds
Out[27]: 
Dataset({
    features: ['images'],
    num_rows: 20
})
Read more comments on GitHub >

github_iconTop Results From Across the Web

Dataset Viewer issue for asapp/slue #5000 - GitHub
I just launched a refresh. It's weird, I don't see any entry for this dataset in the cache, it's a bug on our...
Read more >
Dataset viewer - Hugging Face
The dataset viewer can be disabled. To do this, add a YAML section to the dataset's README.md file (create one if it does...
Read more >
NC Dataset Viewer - CA.gov
This map viewer allows easy viewing and download of Vegetation and Wetland layers that are contained in the Natural Communities Commonly Associated with ......
Read more >
NHDPlus High Resolution | U.S. Geological Survey - USGS.gov
Resolved issues in the NHDPlus HR Datasets: · Missing flow accumulation data · Incorrect catchments due to zero elevation values · Missing relationship...
Read more >
Data Sets - UCI Machine Learning Repository
Name Data Types Default Task Attribute Types # Instances # Attributes Ye... Abalone Multivariate Classification Categorical, Integer, Real 4177 8 19... Adult Multivariate Classification Categorical,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found