question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

For some schedulers, setting PIMS image reader's `.class_priority` is ineffective in controlling `dask-image.imread()`

See original GitHub issue

cc: @jmdelahanty

Hi dask-image developers!

Normally an end-user may control which reader pims.open() uses to load images by simply increasing the .class_priority attribute of their preferred pims reader prior to calling pims.open(). See this link.

pims.ImageIOReader.class_priority = 100  # we set this very high in order to force pims.open() to use this reader
rgb_frames = pims.open('/path/to/video/file.mpg')  # uses ImageIOReader

Since dask-image.imread() uses pims.open(), it would be great if it could mirror such functionality too.

pims.ImageIOReader.class_priority = 100  # we set this very high in order to force dask's imread() to use this reader [via pims.open()]
rgb_frames = dask_image.imread.imread('/path/to/video/file.mpg')  # uses ImageIOReader

And indeed this functionality does work for dask-image.imread() in single-machine schedulers, like “threading” and “sync”. But I do not know of a way to make all processes, in a multi-process scheduler, for example, aware of the preferred reader’s increased .class_priority. Any help here would be greatly appreciated.

Alternatively, it might be an idea to modify dask-image.imread() to receive a “reader” keyword argument which indicates the end-user’s preferred PIMS reader.

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:15 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
ParticularMinercommented, May 13, 2022

Many thanks @jakirkham !

The simplest solution is to just provide your own ProcessPoolExecutor

I followed your first suggestion since that was the easiest one to understand (as you guessed 😄). And it works (see the following code-snippet)!

import dask_image
import pims


def initialize_worker_process():
    """
    Initialize a worker process before running any tasks in it.
    """
    # If Numpy is already imported, presumably its random state was
    # inherited from the parent => re-seed it.
    import sys
    
    np = sys.modules.get("numpy")
    if np is not None:
        np.random.seed()
    
    # We increase the priority of ImageIOReader in order to force dask's 
    # imread() to use this reader [via pims.open()]
    pims.ImageIOReader.class_priority = 100


def get_pool_with_reader_priority_set(num_workers=None):
    import os
    from dask import config
    from dask.system import CPU_COUNT
    from dask.multiprocessing import get_context
    from concurrent.futures import ProcessPoolExecutor
    
    num_workers = num_workers or config.get("num_workers", None) or CPU_COUNT
    if os.environ.get("PYTHONHASHSEED") in (None, "0"):
        # This number is arbitrary; it was chosen to commemorate
        # https://github.com/dask/dask/issues/6640.
        os.environ["PYTHONHASHSEED"] = "6640"
    context = get_context()
    return ProcessPoolExecutor(
        num_workers, mp_context=context, initializer=initialize_worker_process
    )


rgb_frames = dask_image.imread.imread('/path/to/video/file.mpg') 
rgb_frames.compute(scheduler='processes', pool=get_pool_with_reader_priority_set())   # uses ImageIOReader

I suppose a PR that helps the end-user avoid getting his/her hands dirty with the innards of multi-process scheduler technology would be a good idea.

But before that, perhaps I should try dask.distributed

2reactions
jakirkhamcommented, May 19, 2022

Initializer customization added in PR ( https://github.com/dask/dask/pull/9087 ), which should be in the next Dask release

Read more comments on GitHub >

github_iconTop Results From Across the Web

dask_image imread performance issue #181
Then I proceed to implement the image reading function in pure-dask and the performance is much better than the one obtained with dask_image....
Read more >
Image Processing — Dask Examples documentation
Welcome to the quickstart guide for dask-image. Setting up your environment. Importing dask-image. Getting the example data. Reading in image data.
Read more >
Python Image Sequence — pims 0.6.1 documentation
PIMS is a lazy-loading interface to sequential data with numpy-like slicing. ... Load a sequence of images from a directory, where the images...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found