question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

No simple way to pass initializer for process

See original GitHub issue

multiprocessing.Pool has a handy initializer parameter to pass a callable for setting up per-process resources (database connections, loggers, etc) but joblib doesn’t expose a way to pass this.

I see in 0.10 I can pass a custom multiprocessing context, which I hope I can use to achieve this, but per-process setup is likely something that many users will likely need, so would be good if there was an easier way.

(I’d suggest an initializer parameter to Parallel that’s picked up by MultiprocessingBackend)

Issue Analytics

  • State:open
  • Created 7 years ago
  • Reactions:7
  • Comments:5

github_iconTop GitHub Comments

5reactions
gdlmxcommented, Apr 8, 2019

For who is searching for a dirty workaround, I wrote a simple function to inject additional initializer to Parallel instance with a Loky backend.

def with_initializer(self, f_init):
    hasattr(self._backend, '_workers') or self.__enter__()
    origin_init = self._backend._workers._initializer
    def new_init():
        origin_init()
        f_init()
    self._backend._workers._initializer = new_init if callable(origin_init) else f_init
    return self

Example usage:

import matlab
from joblib import Parallel, delayed

x = matlab.double([[0.0]]) # this object can only be loaded after importing matlab

def f(i):
    print(i, x)

def _init_matlab():
    import matlab

with Parallel(4) as para:
    for _ in with_initializer(para, _init_matlab)(delayed(f)(i) for i in range(10)):
        pass

Data objects of some complex libraries such as matlab can only be loaded after importing the python module. An initializer seems to be the only way to guarantee to load a third party module before the child processes try to unpickle those global data objects.

3reactions
astromancercommented, Dec 7, 2017

Custom initializers for processes is a must have feature!

Read more comments on GitHub >

github_iconTop Results From Across the Web

how to use initializer to set up my multiprocess pool?
Each worker is in a separate process. Thus, you can use an ordinary global variable. This is not exactly pretty, but it works:...
Read more >
Multiprocessing Pool Initializer in Python
You can initialize workers in the process pool by setting the “initializer” argument in the multiprocessing.pool.Pool class constructor.
Read more >
Initialize and deinitialize API | Simplygon 9 Documentation
The Simplygon API loader provides an initialization function that must be run ... The easiest way to ensure this is to put the...
Read more >
Object initialization - Ruby-Doc.org
So one simple thing we can do is use an initialize method to put default values into all the instance variables, so the...
Read more >
Initialization — The Swift Programming Language (Swift 5.7)
Initializers are called to create a new instance of a particular type. In its simplest form, an initializer is like an instance method...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found