load_dataset does not read jsonl metadata file properly
See original GitHub issueDescribe the bug
Hi, I’m following this page to create a dataset of images and captions via an image folder and a metadata.json file, but I can’t seem to get the dataloader to recognize the “text” column. It just spits out “image” and “label” as features.
Below is code to reproduce my exact example/problem.
Steps to reproduce the bug
dataset_link="19Unu89Ih_kP6zsE7f9Mkw8dy3NwHopRF"
id = dataset_link
output = 'Godardv01.zip'
gdown.download(id=id, output=output, quiet=False)
ds = load_dataset("imagefolder", data_dir="/kaggle/working/Volumes/TOSHIBA/Godard_imgs/Volumes/TOSHIBA/Godard_imgs/Full/train", split="train", drop_labels=False)
print(ds)
Expected behavior
I would expect that it returned “image” and “text” columns from the code above.
Environment info
datasets
version: 2.1.0- Platform: Linux-5.15.65±x86_64-with-debian-bullseye-sid
- Python version: 3.7.12
- PyArrow version: 5.0.0
- Pandas version: 1.3.5
Issue Analytics
- State:
- Created 10 months ago
- Comments:6 (2 by maintainers)
Top Results From Across the Web
Load dataset from recipe - usage - Prodigy Support
The jsonl file I'm trying to load contains just the text of interest and some basic metadata. Thanks! ines ...
Read more >Unable to load jsonl nested file into a flattened dataframe
I want to load this into a flattened data frame as I want to perform some join and aggregations after inserting into a...
Read more >Reading and writing files - Xarray
Data is always loaded lazily from netCDF files. You can manipulate, slice and subset Dataset and DataArray objects, and no array values are ......
Read more >How to work with object detection datasets in COCO format
The dataset is stored in a directory containing your raw image data and a single json file that contains all of the annotations,...
Read more >Load external tfrecord with TFDS - Datasets - TensorFlow
Load dataset with TFDS ... Add metadata files ( dataset_info.json , features.json ) along your tfrecord ... SequenceExample is not supported, only tf.train....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Can you try updating
datasets
? Metadata support was added indatasets
2.4Update: This was the issue.