Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[feature request] Allow passing in image reading/loading function to old-style datasets constructors

See original GitHub issue

🚀 The feature

This makes more sense now that torchvision is compiled with its own image reading functions (read_image), so easier way to test pipelines without PIL would be nice

In addition, this would allow easy stubbing out image loading when wanted and solving https://github.com/pytorch/vision/issues/4975

Motivation, pitch

N/A

Alternatives

No response

Additional context

No response

cc @pmeier

Issue Analytics

State:
Created 2 years ago
Comments:14 (1 by maintainers)

Top GitHub Comments

1reaction

pmeiercommented, Dec 9, 2021

Unfortunately, I don’t think the change is simple. The functionality you propose is implemented in the new-style datasets, but we already ran into troubles and I see more to come in the future. I’ve opened #5075 for a discussion of how we want to handle decoding in the future. While we will find a solution for them for the new-style datasets I’m not eager to also maintain the same functionality in the old-style ones.

From your request I get that you only want to disable it and not use custom decoding. If don’t care about the actual data and only use it for testing, can’t you simply patch it out?

from unittest import mock

from torchvision import datasets

dataset = datasets.VOCDetection(...)

print(dataset[0][0])

with mock.patch("torchvision.datasets.voc.Image.open"):
    print(dataset[0][0])

<PIL.Image.Image image mode=RGB size=500x442 at 0x7F705C1E8A90>
<MagicMock name='open().convert()' id='140119809177920'>

With the patch, iterating over the complete dataset takes ~1.5 seconds on my machine.

1reaction

pmeiercommented, Nov 25, 2021

Yes, the new datasets will use the decoder parameter to do just that. You can pass any callable to datasets.load(..., decoder=) that takes an open file handle and returns a tensor. torchvision.io.read_image currently takes a path, so we are using PIL by default now.

There are still some design choices to be made what a decoder is and what it should return. For example, how do we handle the case if more than one type of file need to be decoded? Furthermore, how do we handle the case of multiple return values for example the audio and video tensors after decoding a video.