question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

load_dataset does not read jsonl metadata file properly

See original GitHub issue

Describe the bug

Hi, I’m following this page to create a dataset of images and captions via an image folder and a metadata.json file, but I can’t seem to get the dataloader to recognize the “text” column. It just spits out “image” and “label” as features.

Below is code to reproduce my exact example/problem.

Steps to reproduce the bug

dataset_link="19Unu89Ih_kP6zsE7f9Mkw8dy3NwHopRF"
id = dataset_link
output = 'Godardv01.zip'
gdown.download(id=id, output=output, quiet=False)

ds = load_dataset("imagefolder", data_dir="/kaggle/working/Volumes/TOSHIBA/Godard_imgs/Volumes/TOSHIBA/Godard_imgs/Full/train", split="train", drop_labels=False)
print(ds)

Expected behavior

I would expect that it returned “image” and “text” columns from the code above.

Environment info

  • datasets version: 2.1.0
  • Platform: Linux-5.15.65±x86_64-with-debian-bullseye-sid
  • Python version: 3.7.12
  • PyArrow version: 5.0.0
  • Pandas version: 1.3.5

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
lhoestqcommented, Nov 22, 2022

Can you try updating datasets ? Metadata support was added in datasets 2.4

0reactions
065294847commented, Nov 23, 2022

Can you try updating datasets ? Metadata support was added in datasets 2.4

Update: This was the issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Load dataset from recipe - usage - Prodigy Support
The jsonl file I'm trying to load contains just the text of interest and some basic metadata. Thanks! ines ...
Read more >
Unable to load jsonl nested file into a flattened dataframe
I want to load this into a flattened data frame as I want to perform some join and aggregations after inserting into a...
Read more >
Reading and writing files - Xarray
Data is always loaded lazily from netCDF files. You can manipulate, slice and subset Dataset and DataArray objects, and no array values are ......
Read more >
How to work with object detection datasets in COCO format
The dataset is stored in a directory containing your raw image data and a single json file that contains all of the annotations,...
Read more >
Load external tfrecord with TFDS - Datasets - TensorFlow
Load dataset with TFDS ... Add metadata files ( dataset_info.json , features.json ) along your tfrecord ... SequenceExample is not supported, only tf.train....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found