Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improve dask-image imread

See original GitHub issue

The instance that opens each file name is defined as

def _read_frame(fn, i, *, arrayfunc=numpy.asanyarray):
    with pims.open(fn) as imgs:
        return arrayfunc(imgs[i])

In my (now) particular case most of the files have different time, channels , z (single Z projections as of now), y and x. It’s a 6 by 5 panel with 3 channels, 2044x2048 totalling 30 files (with Image{i}.ome.tiff ranging from 0 to 29).

I want to lazy load from disk (at least each time frame) and show it assembled.

As of now, I am testing by assembling them to view live cell mode in napari, and I basically create a massive array loaded into RAM that can’t go beyond a certain size by:

Currently, only trying a horizontal stitch of 1x4.

print(stack.shape)
n_cols = 1
n_rows = 4
tet = np.zeros((143,2044*n_cols,2048*n_rows))
integrase = np.zeros((143,2044*n_cols,2048*n_rows))
nuclei = np.zeros((143,2044*n_cols,2048*n_rows))

i=0
times=143
for col in range(n_cols):
    for row in range(n_rows):
        tet[:times,2044*col:2044*(col+1),2048*row:2048*(row+1)]= stack[i,:times,:,:,0]
        integrase[:times,2044*col:2044*(col+1),2048*row:2048*(row+1)]= stack[i,:times,:,:,1]
        nuclei[:times,2044*col:2044*(col+1),2048*row:2048*(row+1)]= stack[i,:times,:,:,2]
        i+=1

Find an example:

The way imread (or dask, below) is set-up it reads each file and opens an instance where [i] is called. however reshaping does not work when I try to assemble it into higher X and Y, I assume it fails at the channel level (read errors below)

stack = imread('/home/jmamede/Data/tet/tetMoon20201127/*ome.tiff')

print(stack.shape)
stack = stack.reshape(143,2044, 2048*4,3)

> (4, 143, 2044, 2048, 3)
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> <ipython-input-14-98a1b37ead9a> in <module>
>      10 
>      11 print(stack.shape)
> ---> 12 stack = stack.reshape(143,2044, 2048*4,3)
> 
> ~/anaconda3/envs/pycuda/lib/python3.7/site-packages/dask/array/core.py in reshape(self, *shape)
>    1795         if len(shape) == 1 and not isinstance(shape[0], Number):
>    1796             shape = shape[0]
> -> 1797         return reshape(self, shape)
>    1798 
>    1799     def topk(self, k, axis=-1, split_every=None):
> 
> ~/anaconda3/envs/pycuda/lib/python3.7/site-packages/dask/array/reshape.py in reshape(x, shape)
>     193 
>     194     # Logic for how to rechunk
> --> 195     inchunks, outchunks = reshape_rechunk(x.shape, shape, x.chunks)
>     196     x2 = x.rechunk(inchunks)
>     197 
> 
> ~/anaconda3/envs/pycuda/lib/python3.7/site-packages/dask/array/reshape.py in reshape_rechunk(inshape, outshape, inchunks)
>      42                 ileft -= 1
>      43             if reduce(mul, inshape[ileft : ii + 1]) != dout:
> ---> 44                 raise ValueError("Shapes not compatible")
>      45 
>      46             for i in range(ileft + 1, ii + 1):  # need single-shape dimensions
> 
> ValueError: Shapes not compatible

lazy_imread = delayed(imread)
lazy_arrays = [lazy_imread(fn) for fn in filelist]
dask_arrays = [da.from_delayed(delayed_reader, shape=(143,2044,2048,3), dtype='uint16') 
                   for delayed_reader in lazy_arrays]
stack = da.stack(dask_arrays,axis=0)

print(stack.shape)
stack = stack.reshape(143,2044, 2048*4,3)

> (4, 143, 2044, 2048, 3)
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> <ipython-input-15-91398ced57b4> in <module>
>      10 #
>      11 print(stack.shape)
> ---> 12 stack = stack.reshape(143,2044, 2048*4,3)
> 
> ~/anaconda3/envs/pycuda/lib/python3.7/site-packages/dask/array/core.py in reshape(self, *shape)
>    1795         if len(shape) == 1 and not isinstance(shape[0], Number):
>    1796             shape = shape[0]
> -> 1797         return reshape(self, shape)
>    1798 
>    1799     def topk(self, k, axis=-1, split_every=None):
> 
> ~/anaconda3/envs/pycuda/lib/python3.7/site-packages/dask/array/reshape.py in reshape(x, shape)
>     193 
>     194     # Logic for how to rechunk
> --> 195     inchunks, outchunks = reshape_rechunk(x.shape, shape, x.chunks)
>     196     x2 = x.rechunk(inchunks)
>     197 
> 
> ~/anaconda3/envs/pycuda/lib/python3.7/site-packages/dask/array/reshape.py in reshape_rechunk(inshape, outshape, inchunks)
>      42                 ileft -= 1
>      43             if reduce(mul, inshape[ileft : ii + 1]) != dout:
> ---> 44                 raise ValueError("Shapes not compatible")
>      45 
>      46             for i in range(ileft + 1, ii + 1):  # need single-shape dimensions
> 
> ValueError: Shapes not compatible

My question is

Each time we call a different i for given filename with imread, is there a new instance of pims.open() created, or it reuses the same one for each file?

Would something like this work (by starting the variable with imread(fnames_list, channel_to_be_picked_up) )

def _read_frame_improved_JM(fn, i, ch,bundle='zyx', iter='t',*, arrayfunc=numpy.asanyarray):
    with pims.open(fn) as imgs:
        imgs.iter_axes = iter
        imgs.bundle_axes = bundle
        imgs.default_coords['c'] = ch
        return arrayfunc(imgs[i])

or I should create two functions within imread.init.py, one to initialize the file instance (basically like the current _read_frame() ) and another to read each frame with a a certain shape

def _initialize_pims(fn):
    with pims.open(fn) as imgs:
        return imgs[i]

def  _read_frame(pims_object,t,ch, bundle='zyx', iter='t',*, arrayfunc=numpy.asanyarray):
    pims_object.iter_axes=iter
    pims_object.bundle_axes = bundle
    pims_object.default_coords['c'] = ch
    return arrayfunc(imgs[i])

Any guidance is appreciated before I start coding things that might be simpler than what I’m thinking of doing.

Thanks!

Issue Analytics

State:
Created 3 years ago
Comments:9

Top GitHub Comments

1reaction

GenevieveBuckleycommented, Dec 1, 2020

My question is Each time we call a different i for given filename with imread, is there a new instance of pims.open() created, or it reuses the same one for each file?

pims.open() is a function, it’s not creating a class instance or anything like that here. I hope that helps clear up some of the confusion.

It sounds like you’re trying to combine a lot of smaller image files into one large image volume. It might be more straightforward for you to use the block_info keyword argument to dask.array.map_blocks() to specify array-location of the blocks. This is likely to have better performance than reshaping a Dask array after constructing it. The dask array API docs have a small section on block_info here and there is also a page on combining dask arrays here.

0reactions

joaomamedecommented, Apr 11, 2021

I will eventually try, feel free to close if you think it was fixed. I wrote a new function to do my bit, based on the one I posted above.

Top Results From Across the Web

Image Processing — Dask Examples documentation

Let's load a public domain image of the astronaut Eileen Collins with dask-image imread(). This image was originally downloaded from the NASA Great...

dask_image imread performance issue #181 - GitHub

Dear dask_image community,. I am a new dask_image user. Maybe, due to my beginner level, I am doing something wrong, but I noticed...

Load images into a Dask Dataframe - Stack Overflow

Load images into a Dask Dataframe · First try: using lambda function and apply delayed imread to each cell · Second try: without...

dask-image: distributed image processing for large data

from dask_image.imread import imread images = imread('data/BBBC039/images/*.tif') ... Denoising images with a small blur can improve segmentation later on.

dask-image - PyPI

Improve imread performance: reduced overhead of pim.open calls when reading from image sequence (#182). Bug Fixes. dask-image imread v0.5.0 not working with ...