question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ArrowInvalid: Could not convert <PIL.Image.Image image mode=RGB when adding image to Dataset

See original GitHub issue

Describe the bug

When adding a Pillow image to an existing Dataset on the hub, add_item fails due to the Pillow image not being automatically converted into the Image feature.

Steps to reproduce the bug

from datasets import load_dataset
from PIL import Image

dataset = load_dataset("hf-internal-testing/example-documents")

# load any random Pillow image
image = Image.open("/content/cord_example.png").convert("RGB")

new_image = {'image': image}
dataset['test'] = dataset['test'].add_item(new_image)

Expected results

The image should be automatically casted to the Image feature when using add_item. For now, this can be fixed by using encode_example:

import datasets

feature = datasets.Image(decode=False)
new_image = {'image': feature.encode_example(image)}
dataset['test'] = dataset['test'].add_item(new_image)

Actual results

ArrowInvalid: Could not convert <PIL.Image.Image image mode=RGB size=576x864 at 0x7F7CCC4589D0> with type Image: did not recognize Python value type when inferring an Arrow data type

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
mariosaskocommented, Aug 19, 2022

Hi @darraghdog! No PR yet, but I plan to fix this before the next release.

3reactions
NielsRoggecommented, Aug 12, 2022

@mariosasko I’m getting a similar issue when creating a Dataset from a Pandas dataframe, like so:

from datasets import Dataset, Features, Image, Value
import pandas as pd
import requests
import PIL

# we need to define the features ourselves
features = Features({
    'a': Value(dtype='int32'),
    'b': Image(),
})

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = PIL.Image.open(requests.get(url, stream=True).raw)

df = pd.DataFrame({"a": [1, 2], 
                   "b": [image, image]})

dataset = Dataset.from_pandas(df, features=features) 

results in

ArrowInvalid: ('Could not convert <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=640x480 at 0x7F7991A15C10> with type JpegImageFile: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column b with type object')

Will the PR linked above also fix that?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Could not convert with type Image: did not recognize Python ...
I encountered the same thing. The problem is that you need to cast the Pillow image back to the Image feature of the...
Read more >
How to add new image to existing dataset?
Let's say you have a dataset on the hub, containing some images: from ... ArrowInvalid: Could not convert <PIL.Image.Image image mode=RGB ...
Read more >
Image Module - Pillow (PIL Fork) 9.3.0 documentation
This means that changes to the original buffer object are reflected in this image). Not all modes can share memory; supported modes include...
Read more >
How to Convert images to NumPy array? - GeeksforGeeks
In Machine Learning, Python uses the image data in the format of Height, Width, Channel format. i.e. Images are converted into Numpy Array...
Read more >
After upgrade to the latest version now this error id showing up ...
ArrowInvalid : ('Could not convert int64 with type numpy.dtype: did not recognize Python value type when inferring an Arrow data type', ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found