Parallelization fails when inner function uses numba guvectorize.
See original GitHub issueEDIT: Removed environment variables and target="parallel"
in guvectorize to make the example more minimal.
My machine:
- Ubuntu 19.04
Package versions:
- numpy=1.18.1
- numba=0.49.0
- joblib=0.14.1
Problem: I want to parallelize code which internally calls functions that use the guvectorize decorator from numba. The error says that arguments of the function are not picklable; however, the arguments are numpy arrays, which are picklable.
Minimal Example:
import joblib
import numpy as np
from numba import guvectorize
@guvectorize(
["f8[:], f8[:]"], "(n) -> ()", nopython=True
)
def func(a, out):
out_ = a.sum()
out[0] = out_
n = 2
a_list = [np.random.randn(n) for _ in range(2)]
# does not work
with joblib.parallel_backend("loky", n_jobs=2):
result = joblib.Parallel()(
joblib.delayed(func)(a) for a in a_list
)
Traceback:
_RemoteTraceback:
'''
Traceback (most recent call last):
File "/home/tm/.local/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 391, in _process_worker
call_item = call_queue.get(block=True, timeout=timeout)
File "/usr/lib/python3.7/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/usr/lib/python3/dist-packages/numpy/core/__init__.py", line 149, in _ufunc_reconstruct
return getattr(mod, name)
AttributeError: module '__main__' has no attribute 'func'
'''
The above exception was the direct cause of the following exception:
BrokenProcessPool Traceback (most recent call last)
<ipython-input-8-51ac1601ff38> in <module>
1 with joblib.parallel_backend("loky", n_jobs=2):
2 result = joblib.Parallel()(
----> 3 joblib.delayed(func)(a) for a in a_list
4 )
5
~/.local/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
932
933 with self._backend.retrieval_context():
--> 934 self.retrieve()
935 # Make sure that we get a last message telling us we are done
936 elapsed_time = time.time() - self._start_time
~/.local/lib/python3.7/site-packages/joblib/parallel.py in retrieve(self)
831 try:
832 if getattr(self._backend, 'supports_timeout', False):
--> 833 self._output.extend(job.get(timeout=self.timeout))
834 else:
835 self._output.extend(job.get())
~/.local/lib/python3.7/site-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
519 AsyncResults.get from multiprocessing."""
520 try:
--> 521 return future.result(timeout=timeout)
522 except LokyTimeoutError:
523 raise TimeoutError()
/usr/lib/python3.7/concurrent/futures/_base.py in result(self, timeout)
430 raise CancelledError()
431 elif self._state == FINISHED:
--> 432 return self.__get_result()
433 else:
434 raise TimeoutError()
/usr/lib/python3.7/concurrent/futures/_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
--> 384 raise self._exception
385 else:
386 return self._result
BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Automatic parallelization with @jit
Some operations inside a user defined function, e.g. adding a scalar value to an array, are known to have parallel semantics. A user...
Read more >Numba, Numpy: Call @guvectorized function in a parallel ...
To answer my own question. In the Numba Gitter chat, someone pointed out: The issue is that guvectorize is generating a true NumPy...
Read more >Automatic parallelization with @jit - Numba documentation
Some operations inside a user defined function, e.g. adding a scalar value to an array, are known to have parallel semantics. A user...
Read more >Parallelism in Python* Using Numba*: The Fundamentals
The first requirement for using Numba is that your target code for JIT or LLVM compilation optimization must be enclosed inside a function....
Read more >Parallel Python with Numba and ParallelAccelerator
To use multiple cores in a Python program, there are three options. ... functions, or as inner functions inside a parallel @jit function....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
As I said, above, just put the code of your function in a module that you import instead of putting its code in the main script or in an interactive jupyter session.
Let’s close as it seems more related to numba.