question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

imread - investigate possible performance improvement

See original GitHub issue

It’s been found that the performance of da.map_blocks is much better than da.stack when joining large arrays: https://github.com/dask/dask/issues/5913

It’s unclear if da.concatenate (like we use in imread) is also slower, but this seems likely. We should investigate if we can get a performance benefit by switching to da.map_blocks.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:16 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
m-albertcommented, Oct 21, 2020

Hi guys, just saw this here and remembered that I did stumble upon bad performance of dask.image.array.stack in the past when creating large arrays.

We should investigate if we can get a performance benefit by switching to da.map_blocks.

So I added a da.map_blocks version of dask_image.imread.imread and compared it to the current implementation using da.concatenate.

map_blocks implementation:

import itertools
import numbers
import warnings

import dask
import dask.array
import dask.delayed
import numpy
import pims

from dask_image.imread import _utils

def imread_mb(fname, nframes=1, *, arraytype="numpy"):
    """
    Read image data into a Dask Array.

    Provides a simple, fast mechanism to ingest image data into a
    Dask Array.

    Parameters
    ----------
    fname : str
        A glob like string that may match one or multiple filenames.
    nframes : int, optional
        Number of the frames to include in each chunk (default: 1).
    arraytype : str, optional
        Array type for dask chunks. Available options: "numpy", "cupy".

    Returns
    -------
    array : dask.array.Array
        A Dask Array representing the contents of all image files.
    """

    if not isinstance(nframes, numbers.Integral):
        raise ValueError("`nframes` must be an integer.")
    if (nframes != -1) and not (nframes > 0):
        raise ValueError("`nframes` must be greater than zero.")

    if arraytype == "numpy":
        arrayfunc = numpy.asanyarray
    elif arraytype == "cupy":   # pragma: no cover
        import cupy
        arrayfunc = cupy.asanyarray

    with pims.open(fname) as imgs:
        shape = (len(imgs),) + imgs.frame_shape
        dtype = numpy.dtype(imgs.pixel_type)

    if nframes == -1:
        nframes = shape[0]

    if nframes > shape[0]:
        warnings.warn(
            "`nframes` larger than number of frames in file."
            " Will truncate to number of frames in file.",
            RuntimeWarning
        )
    elif shape[0] % nframes != 0:
        warnings.warn(
            "`nframes` does not nicely divide number of frames in file."
            " Last chunk will contain the remainder.",
            RuntimeWarning
        )

    lower_iter, upper_iter = itertools.tee(itertools.chain(
        range(0, shape[0], nframes),
        [shape[0]]
    ))
    next(upper_iter)
    
#     a = []
#     for i, j in zip(lower_iter, upper_iter):
#         print(i, j)
#         a.append(dask.array.from_delayed(
#             dask.delayed(_utils._read_frame)(fname, slice(i, j),
#                                              arrayfunc=arrayfunc),
#             (j - i,) + shape[1:],
#             dtype,
#             meta=arrayfunc([])
#         ))
#     a = dask.array.concatenate(a)

    def func(fname, arrayfunc, block_info=None):
        i, j = block_info[None]['array-location'][0]
        return _utils._read_frame(fname, slice(i, j), arrayfunc=arrayfunc)
        
    from dask.array.core import normalize_chunks
    a = dask.array.map_blocks(
        func,
        chunks=normalize_chunks((nframes, ) + shape[1:], shape),
        fname=fname,
        arrayfunc=arrayfunc,
        meta=arrayfunc([]),
    )

    return a

Comparison:

# write some dummy data
import tifffile
import numpy as np
for t in range(10000):
    tmpim = np.random.randint(0,1000, [2, 2]).astype(np.uint16)
    tifffile.imsave('data/im_t%03d.tif' %t, tmpim)
from dask_image import imread
%timeit imread.imread('data/im_*.tif')

2.96 s ± 46.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

from dask_image import imread
%timeit imread_mb('data/im_*.tif')

150 ms ± 1.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So there’s a big performance difference!

Also, indexing the resulting array is faster in the map_blocks version:

im = imread.imread('data/im_*.tif')
im_mb = imread_mb('data/im_*.tif')

def iterate_through_first_axis(im):
    for i in range(im.shape[0]):
        im[i]
    return
%timeit iterate_through_first_axis(im)

11.4 s ± 199 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit iterate_through_first_axis(im_mb)

1.35 s ± 22.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So it seems that map_blocks is the way to go for putting together large images. Happy to open a PR.

0reactions
GenevieveBuckleycommented, Oct 26, 2020
Read more comments on GitHub >

github_iconTop Results From Across the Web

AttributeError: module 'cv2' has no attribute 'imread'
Reader's problem could be, that a wrong library (cv2 package) has been installed. I installed opencv-python3 instead of opencv-python for ...
Read more >
Working with skimage – Image Processing with Python
Let us examine a simple Python program to load, display, ... Then, we use the iio.imread() function to read a JPEG image entitled...
Read more >
Image Processing Toolbox™ User's Guide
reduced resolution data set (R-Set) can improve performance. Use the Image. Viewer to navigate an R-Set image the same way you navigate a...
Read more >
Faster video file FPS with cv2.VideoCapture and OpenCV
Learn how to boost video file FPS processing throughout by over 52% utilizing threading with OpenCV and Python.
Read more >
Solved [USE MATLAB] 1. Load an image “SanDiego.jpg” in to
Plot title: “original color image” Hint: imread, imshow 2. Contrast enhancement is a cool way to learn image processing. ... possible pixel values....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found