Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

imread - investigate possible performance improvement

See original GitHub issue

It’s been found that the performance of da.map_blocks is much better than da.stack when joining large arrays: https://github.com/dask/dask/issues/5913

It’s unclear if da.concatenate (like we use in imread) is also slower, but this seems likely. We should investigate if we can get a performance benefit by switching to da.map_blocks.

Issue Analytics

State:
Created 3 years ago
Comments:16 (3 by maintainers)

Top GitHub Comments

1reaction

m-albertcommented, Oct 21, 2020

Hi guys, just saw this here and remembered that I did stumble upon bad performance of dask.image.array.stack in the past when creating large arrays.

We should investigate if we can get a performance benefit by switching to da.map_blocks.

So I added a da.map_blocks version of dask_image.imread.imread and compared it to the current implementation using da.concatenate.

map_blocks implementation:

import itertools
import numbers
import warnings

import dask
import dask.array
import dask.delayed
import numpy
import pims

from dask_image.imread import _utils

def imread_mb(fname, nframes=1, *, arraytype="numpy"):
    """
    Read image data into a Dask Array.

    Provides a simple, fast mechanism to ingest image data into a
    Dask Array.

    Parameters
    ----------
    fname : str
        A glob like string that may match one or multiple filenames.
    nframes : int, optional
        Number of the frames to include in each chunk (default: 1).
    arraytype : str, optional
        Array type for dask chunks. Available options: "numpy", "cupy".

    Returns
    -------
    array : dask.array.Array
        A Dask Array representing the contents of all image files.
    """

    if not isinstance(nframes, numbers.Integral):
        raise ValueError("`nframes` must be an integer.")
    if (nframes != -1) and not (nframes > 0):
        raise ValueError("`nframes` must be greater than zero.")

    if arraytype == "numpy":
        arrayfunc = numpy.asanyarray
    elif arraytype == "cupy":   # pragma: no cover
        import cupy
        arrayfunc = cupy.asanyarray

    with pims.open(fname) as imgs:
        shape = (len(imgs),) + imgs.frame_shape
        dtype = numpy.dtype(imgs.pixel_type)

    if nframes == -1:
        nframes = shape[0]

    if nframes > shape[0]:
        warnings.warn(
            "`nframes` larger than number of frames in file."
            " Will truncate to number of frames in file.",
            RuntimeWarning
        )
    elif shape[0] % nframes != 0:
        warnings.warn(
            "`nframes` does not nicely divide number of frames in file."
            " Last chunk will contain the remainder.",
            RuntimeWarning
        )

    lower_iter, upper_iter = itertools.tee(itertools.chain(
        range(0, shape[0], nframes),
        [shape[0]]
    ))
    next(upper_iter)
    
#     a = []
#     for i, j in zip(lower_iter, upper_iter):
#         print(i, j)
#         a.append(dask.array.from_delayed(
#             dask.delayed(_utils._read_frame)(fname, slice(i, j),
#                                              arrayfunc=arrayfunc),
#             (j - i,) + shape[1:],
#             dtype,
#             meta=arrayfunc([])
#         ))
#     a = dask.array.concatenate(a)

    def func(fname, arrayfunc, block_info=None):
        i, j = block_info[None]['array-location'][0]
        return _utils._read_frame(fname, slice(i, j), arrayfunc=arrayfunc)
        
    from dask.array.core import normalize_chunks
    a = dask.array.map_blocks(
        func,
        chunks=normalize_chunks((nframes, ) + shape[1:], shape),
        fname=fname,
        arrayfunc=arrayfunc,
        meta=arrayfunc([]),
    )

    return a

Comparison:

# write some dummy data
import tifffile
import numpy as np
for t in range(10000):
    tmpim = np.random.randint(0,1000, [2, 2]).astype(np.uint16)
    tifffile.imsave('data/im_t%03d.tif' %t, tmpim)

from dask_image import imread
%timeit imread.imread('data/im_*.tif')

2.96 s ± 46.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

from dask_image import imread
%timeit imread_mb('data/im_*.tif')

150 ms ± 1.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So there’s a big performance difference!

Also, indexing the resulting array is faster in the map_blocks version:

im = imread.imread('data/im_*.tif')
im_mb = imread_mb('data/im_*.tif')

def iterate_through_first_axis(im):
    for i in range(im.shape[0]):
        im[i]
    return

%timeit iterate_through_first_axis(im)

11.4 s ± 199 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit iterate_through_first_axis(im_mb)

1.35 s ± 22.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So it seems that map_blocks is the way to go for putting together large images. Happy to open a PR.

0reactions

GenevieveBuckleycommented, Oct 26, 2020

Closed by https://github.com/dask/dask-image/pull/165

Top Results From Across the Web

AttributeError: module 'cv2' has no attribute 'imread'

Reader's problem could be, that a wrong library (cv2 package) has been installed. I installed opencv-python3 instead of opencv-python for ...

Working with skimage – Image Processing with Python

Let us examine a simple Python program to load, display, ... Then, we use the iio.imread() function to read a JPEG image entitled...

Image Processing Toolbox™ User's Guide

reduced resolution data set (R-Set) can improve performance. Use the Image. Viewer to navigate an R-Set image the same way you navigate a...

Faster video file FPS with cv2.VideoCapture and OpenCV

Learn how to boost video file FPS processing throughout by over 52% utilizing threading with OpenCV and Python.

Solved [USE MATLAB] 1. Load an image “SanDiego.jpg” in to

Plot title: “original color image” Hint: imread, imshow 2. Contrast enhancement is a cool way to learn image processing. ... possible pixel values....