question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Loky BrokenProcessPool

See original GitHub issue

As of recent PyMC3 is seeing travis failures, I think this coincides with joblib 0.13. The errors look like this:

pymc3/sampling.py:440: in sample
    trace = _mp_sample(**sample_args)
pymc3/sampling.py:1033: in _mp_sample
    traces = Parallel(n_jobs=cores, mmap_mode=None)(jobs)
../../../miniconda2/envs/testenv/lib/python2.7/site-packages/joblib/parallel.py:930: in __call__
    self.retrieve()
../../../miniconda2/envs/testenv/lib/python2.7/site-packages/joblib/parallel.py:833: in retrieve
    self._output.extend(job.get(timeout=self.timeout))
../../../miniconda2/envs/testenv/lib/python2.7/site-packages/joblib/_parallel_backends.py:521: in wrap_future_result
    return future.result(timeout=timeout)
../../../miniconda2/envs/testenv/lib/python2.7/site-packages/joblib/externals/loky/_base.py:433: in result
    return self.__get_result()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <Future at 0x7f24374622d0 state=finished raised BrokenProcessPool>
    def __get_result(self):
        if self._exception:
>           raise self._exception
E           BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
E           
E           This was caused directly by 
E           '''
E           Traceback (most recent call last):
E             File "/home/travis/miniconda2/envs/testenv/lib/python2.7/site-packages/joblib/externals/loky/process_executor.py", line 391, in _process_worker
E               call_item = call_queue.get(block=True, timeout=timeout)
E             File "/home/travis/miniconda2/envs/testenv/lib/python2.7/multiprocessing/queues.py", line 135, in get
E               res = self._recv()
E             File "/home/travis/build/pymc-devs/pymc3/pymc3/step_methods/arraystep.py", line 39, in __new__
E               model = modelcontext(kwargs.get('model'))
E             File "/home/travis/build/pymc-devs/pymc3/pymc3/model.py", line 191, in modelcontext
E               return Model.get_context()
E             File "/home/travis/build/pymc-devs/pymc3/pymc3/model.py", line 183, in get_context
E               raise TypeError("No context on context stack")
E           TypeError: No context on context stack
E           '''
../../../miniconda2/envs/testenv/lib/python2.7/site-packages/joblib/externals/loky/_base.py:381: BrokenProcessPool

https://travis-ci.org/pymc-devs/pymc3/jobs/455064428

Any ideas? Was the pickling behavior changed somehow?

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

8reactions
pierreglasercommented, Dec 5, 2018

Hi @twiecki, I have been looking at the error in the build, and here is what I think is going on.

Recent versions of joblib currenlty use loky as a backend by default, where python objects used in the child processes must be serialized to be sent to the workers.

The problem with your current test is that the sgfs instance being serialized relies at creation time on a class attribute (the Context.contexts) that gets mutated during the execution of parent process (when Model.__enter__ is called). Unfortunatly, the mutation will not happen in the child processes, hence the error.

A quick fix option is to use “multiprocessing” instead of “loky” as a backend for joblib (that does not require any input serialization): Parallel(backend='multiprocessing'). Doing this for me made the test pass on my local laptop.

For the record, tests are passing on python3 because concurrent.futures also uses multiprocessing as their default backend.

One tricky case would be to run this test under windows, because the only available method to create new processes is spawn, so object must be serialized at some point. However, I did not find any CI scripts running under windows on pymc3. Could you confirm that?

1reaction
ogriselcommented, Jan 29, 2019

Note that the fork start method of multiprocessing used by default in concurrent.futures is still problematic (crash or freeze) when using libraries that use OpenMP thread pools (e.g. lightgbm, xgboost, and soon scikit-learn).

So I would advise you to use the spawn or forkserver start method of concurrent.futures in which case the mutated attributed will not be visible in the children process.

When doing parallel calls with concurrent futures you, you should probably pass the model context explicitly/

Read more comments on GitHub >

github_iconTop Results From Across the Web

Serialization of un-picklable objects — loky 3.3.0 documentation
This example highlights the options for tempering with loky serialization process. ... BrokenProcessPool: A task has failed to un-serialize.
Read more >
python - What is causing my random: "joblib.externals.loky ...
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated.
Read more >
Parallellization issues - PennyLane Help
joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all ...
Read more >
externals/loky/process_executor.py · alkaline-ml/joblib - Gemfury
... get_exitcodes_terminated_worker try: from concurrent.futures.process import BrokenProcessPool as _BPPException except ImportError: _BPPException ...
Read more >
I have this problem. : Forums - PythonAnywhere
exception calling callback for <Future at 0x7fcb49146880 state=finished raised BrokenProcessPool> joblib.externals.loky.process_executor.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found