Loky BrokenProcessPool
See original GitHub issueAs of recent PyMC3 is seeing travis failures, I think this coincides with joblib 0.13. The errors look like this:
pymc3/sampling.py:440: in sample
trace = _mp_sample(**sample_args)
pymc3/sampling.py:1033: in _mp_sample
traces = Parallel(n_jobs=cores, mmap_mode=None)(jobs)
../../../miniconda2/envs/testenv/lib/python2.7/site-packages/joblib/parallel.py:930: in __call__
self.retrieve()
../../../miniconda2/envs/testenv/lib/python2.7/site-packages/joblib/parallel.py:833: in retrieve
self._output.extend(job.get(timeout=self.timeout))
../../../miniconda2/envs/testenv/lib/python2.7/site-packages/joblib/_parallel_backends.py:521: in wrap_future_result
return future.result(timeout=timeout)
../../../miniconda2/envs/testenv/lib/python2.7/site-packages/joblib/externals/loky/_base.py:433: in result
return self.__get_result()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Future at 0x7f24374622d0 state=finished raised BrokenProcessPool>
def __get_result(self):
if self._exception:
> raise self._exception
E BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
E
E This was caused directly by
E '''
E Traceback (most recent call last):
E File "/home/travis/miniconda2/envs/testenv/lib/python2.7/site-packages/joblib/externals/loky/process_executor.py", line 391, in _process_worker
E call_item = call_queue.get(block=True, timeout=timeout)
E File "/home/travis/miniconda2/envs/testenv/lib/python2.7/multiprocessing/queues.py", line 135, in get
E res = self._recv()
E File "/home/travis/build/pymc-devs/pymc3/pymc3/step_methods/arraystep.py", line 39, in __new__
E model = modelcontext(kwargs.get('model'))
E File "/home/travis/build/pymc-devs/pymc3/pymc3/model.py", line 191, in modelcontext
E return Model.get_context()
E File "/home/travis/build/pymc-devs/pymc3/pymc3/model.py", line 183, in get_context
E raise TypeError("No context on context stack")
E TypeError: No context on context stack
E '''
../../../miniconda2/envs/testenv/lib/python2.7/site-packages/joblib/externals/loky/_base.py:381: BrokenProcessPool
https://travis-ci.org/pymc-devs/pymc3/jobs/455064428
Any ideas? Was the pickling behavior changed somehow?
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
Serialization of un-picklable objects — loky 3.3.0 documentation
This example highlights the options for tempering with loky serialization process. ... BrokenProcessPool: A task has failed to un-serialize.
Read more >python - What is causing my random: "joblib.externals.loky ...
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated.
Read more >Parallellization issues - PennyLane Help
joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all ...
Read more >externals/loky/process_executor.py · alkaline-ml/joblib - Gemfury
... get_exitcodes_terminated_worker try: from concurrent.futures.process import BrokenProcessPool as _BPPException except ImportError: _BPPException ...
Read more >I have this problem. : Forums - PythonAnywhere
exception calling callback for <Future at 0x7fcb49146880 state=finished raised BrokenProcessPool> joblib.externals.loky.process_executor.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @twiecki, I have been looking at the error in the build, and here is what I think is going on.
Recent versions of
joblib
currenlty useloky
as a backend by default, where python objects used in the child processes must be serialized to be sent to the workers.The problem with your current test is that the
sgfs
instance being serialized relies at creation time on a class attribute (theContext.contexts
) that gets mutated during the execution of parent process (whenModel.__enter__
is called). Unfortunatly, the mutation will not happen in the child processes, hence the error.A quick fix option is to use “multiprocessing” instead of “loky” as a backend for
joblib
(that does not require any input serialization):Parallel(backend='multiprocessing')
. Doing this for me made the test pass on my local laptop.For the record, tests are passing on
python3
becauseconcurrent.futures
also uses multiprocessing as their default backend.One tricky case would be to run this test under windows, because the only available method to create new processes is spawn, so object must be serialized at some point. However, I did not find any CI scripts running under windows on
pymc3
. Could you confirm that?Note that the
fork
start method of multiprocessing used by default in concurrent.futures is still problematic (crash or freeze) when using libraries that use OpenMP thread pools (e.g. lightgbm, xgboost, and soon scikit-learn).So I would advise you to use the
spawn
orforkserver
start method of concurrent.futures in which case the mutated attributed will not be visible in the children process.When doing parallel calls with concurrent futures you, you should probably pass the model context explicitly/