Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

One `imread` to rule them all

See original GitHub issue

A lot of people have put a lot of effort into imread lately. This is great, and it’s really helped. However, we’ve still got a way to go.

This is where I see the four major areas problems pop up in:

Read image data into Dask arrays accurately. We need more simple test cases here. Bug report: https://github.com/dask/dask-image/issues/220
Reduce confusion. Currently, there are multiple implementations of a dask imread function. The two most easily confused are dask_image.imread.imread() and dask.array.image.imread(). We need to figure out which is best, and only use that one.
Read data in fast. For that, we’ll need to have some proper benchmarks, and run them routinely as part of the CI. This will help us decide (2) above. Previous discussion:
- Imread performance issue https://github.com/dask/dask-image/issues/181
- Getting movie files into Dask efficiently https://github.com/dask/dask-image/issues/134
Process the image data fast, too. For that to happen, we need smart default choices for how we chunk image data in dask arrays. Jackson Maxfield Brown describes the problem well in this short video here

Issue Analytics

State:
Created 2 years ago
Reactions:3
Comments:9 (1 by maintainers)

Top GitHub Comments

1reaction

jakirkhamcommented, May 13, 2022

Yeah this comes up with large multipage TIFFs. They can be kind of movie-like

Wonder if we should just make the move to using ImageIO here with PR ( https://github.com/imageio/imageio/pull/739 ) in? It’s hard supporting all of the different file formats/use cases out there. Maybe a better separation of concerns would improve the user experience.

Edit: Also broadly related ( https://github.com/dask/dask/issues/9049 )

0reactions

GenevieveBuckleycommented, May 13, 2022

One big disadvantage for dask.array.image.imread is poor chunking behaviour. It looks like it makes a single chunk for every filename on disk. This is not greart for movie files or multislice tiffs, etc. where you probably don’t want to load the whole movie file into RAM.

See https://github.com/dask/dask-image/issues/262#issuecomment-1125063820