Getting movie files into dask efficiently
See original GitHub issue- dask-image version: 0.2.0
- Python version: 3.7
- Operating System: Mac OSX
Description
I’m interesting in getting movie files - .mov
, .mpeg
, .avi
(basically anything readable with ffmpeg) into dask in a nice way - i.e. something like dask_image.imread.imread
but that can accept these formats.
It is possible to read these formats into python via ffmpeg using libraries like imageio.imread or pyav but these tend to return video objects that have iterators or get frame methods on them, but I would like a dask array that I can call into in a lazy fashion to get just what I need and have it be highly performant.
Note there has been some discussion around this on an image.sc post I made, including caveats around attempts at full random access when looking at movie files. I am fine with cacheing of intermediate results to make accessing neighboring frames fast, and I’m fine if making big jumps in the movie is slow, but accessing nearby frames should be fast (I’m interested in using this for interactive movie visualisation using napari so it is reasonable to expect that most times people will be looking at frames in order, but they might want to jump around and things should cached nicely too)
What I Did
I made some attempts at this myself modifying the dask_image.imread.imread
code - see here
import imageio
from dask import delayed
import dask.array as da
from dask.cache import Cache
cache = Cache(2e9) # Leverage two gigabytes of memory
cache.register()
def dask_from_mov(path):
vid = imageio.get_reader(path, 'ffmpeg')
shape = vid.get_meta_data()['size'][::-1] + (3,)
lazy_imread = delayed(vid.get_data)
return da.stack([da.from_delayed(lazy_imread(i), shape=shape, dtype=np.uint8) for i in range(vid.count_frames())])
There are more code snippets and links to some .mov
files in the image.sc post linked to above if people what more detail.
Overall performance of that approach was not very good. I can do some benchmarking etc, but I suspected that what I’m doing is horrible inefficient from a decoding standpoint and there might be a lower level of the ffmpeg reader to connect with dask. Curious if anyone here has any experience with this or ideas?
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:9 (2 by maintainers)
Top GitHub Comments
Thanks, that’s useful to hear. PIMS and napari have started talking more recently and I expect that to continue, so hopefully we can work together to smooth this out.
That would be great @danielballan - the next napari release (should be < 2 weeks and 0.3.0) will be the first one that supports the addition of
reader
plugins by @tlambert03 - we’ve got the basic machinery merged into master, and are now working on a few details and documentation, see https://github.com/napari/napari/pull/1030. At that point I’ve love to see both PIMS and dask-image be able to load data into napari via our plugin mechanism (which is hopefully pretty light weight and not too far from where you are now).