ArrowInvalid: Could not convert <PIL.Image.Image image mode=RGB when adding image to Dataset
See original GitHub issueDescribe the bug
When adding a Pillow image to an existing Dataset on the hub, add_item
fails due to the Pillow image not being automatically converted into the Image feature.
Steps to reproduce the bug
from datasets import load_dataset
from PIL import Image
dataset = load_dataset("hf-internal-testing/example-documents")
# load any random Pillow image
image = Image.open("/content/cord_example.png").convert("RGB")
new_image = {'image': image}
dataset['test'] = dataset['test'].add_item(new_image)
Expected results
The image should be automatically casted to the Image feature when using add_item
. For now, this can be fixed by using encode_example
:
import datasets
feature = datasets.Image(decode=False)
new_image = {'image': feature.encode_example(image)}
dataset['test'] = dataset['test'].add_item(new_image)
Actual results
ArrowInvalid: Could not convert <PIL.Image.Image image mode=RGB size=576x864 at 0x7F7CCC4589D0> with type Image: did not recognize Python value type when inferring an Arrow data type
Issue Analytics
- State:
- Created a year ago
- Comments:6 (5 by maintainers)
Top Results From Across the Web
Could not convert with type Image: did not recognize Python ...
I encountered the same thing. The problem is that you need to cast the Pillow image back to the Image feature of the...
Read more >How to add new image to existing dataset?
Let's say you have a dataset on the hub, containing some images: from ... ArrowInvalid: Could not convert <PIL.Image.Image image mode=RGB ...
Read more >Image Module - Pillow (PIL Fork) 9.3.0 documentation
This means that changes to the original buffer object are reflected in this image). Not all modes can share memory; supported modes include...
Read more >How to Convert images to NumPy array? - GeeksforGeeks
In Machine Learning, Python uses the image data in the format of Height, Width, Channel format. i.e. Images are converted into Numpy Array...
Read more >After upgrade to the latest version now this error id showing up ...
ArrowInvalid : ('Could not convert int64 with type numpy.dtype: did not recognize Python value type when inferring an Arrow data type', ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @darraghdog! No PR yet, but I plan to fix this before the next release.
@mariosasko I’m getting a similar issue when creating a Dataset from a Pandas dataframe, like so:
results in
Will the PR linked above also fix that?