Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Getting movie files into dask efficiently

See original GitHub issue

dask-image version: 0.2.0
Python version: 3.7
Operating System: Mac OSX

Description

I’m interesting in getting movie files - .mov, .mpeg, .avi (basically anything readable with ffmpeg) into dask in a nice way - i.e. something like dask_image.imread.imread but that can accept these formats.

It is possible to read these formats into python via ffmpeg using libraries like imageio.imread or pyav but these tend to return video objects that have iterators or get frame methods on them, but I would like a dask array that I can call into in a lazy fashion to get just what I need and have it be highly performant.

Note there has been some discussion around this on an image.sc post I made, including caveats around attempts at full random access when looking at movie files. I am fine with cacheing of intermediate results to make accessing neighboring frames fast, and I’m fine if making big jumps in the movie is slow, but accessing nearby frames should be fast (I’m interested in using this for interactive movie visualisation using napari so it is reasonable to expect that most times people will be looking at frames in order, but they might want to jump around and things should cached nicely too)

What I Did

I made some attempts at this myself modifying the dask_image.imread.imread code - see here

import imageio
from dask import delayed
import dask.array as da
from dask.cache import Cache

cache = Cache(2e9)  # Leverage two gigabytes of memory
cache.register()

def dask_from_mov(path):
    vid = imageio.get_reader(path,  'ffmpeg')
    shape = vid.get_meta_data()['size'][::-1] + (3,)
    lazy_imread = delayed(vid.get_data)
    return da.stack([da.from_delayed(lazy_imread(i), shape=shape, dtype=np.uint8) for i in range(vid.count_frames())])

There are more code snippets and links to some .mov files in the image.sc post linked to above if people what more detail.

Overall performance of that approach was not very good. I can do some benchmarking etc, but I suspected that what I’m doing is horrible inefficient from a decoding standpoint and there might be a lower level of the ffmpeg reader to connect with dask. Curious if anyone here has any experience with this or ideas?

Issue Analytics

State:
Created 4 years ago
Reactions:2
Comments:9 (2 by maintainers)

Top GitHub Comments

8reactions

danielballancommented, Mar 13, 2020

Thanks, that’s useful to hear. PIMS and napari have started talking more recently and I expect that to continue, so hopefully we can work together to smooth this out.

4reactions

sofroniewncommented, Mar 13, 2020

That would be great @danielballan - the next napari release (should be < 2 weeks and 0.3.0) will be the first one that supports the addition of reader plugins by @tlambert03 - we’ve got the basic machinery merged into master, and are now working on a few details and documentation, see https://github.com/napari/napari/pull/1030. At that point I’ve love to see both PIMS and dask-image be able to load data into napari via our plugin mechanism (which is hopefully pretty light weight and not too far from where you are now).