Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parallelization fails when inner function uses numba guvectorize.

See original GitHub issue

EDIT: Removed environment variables and target="parallel" in guvectorize to make the example more minimal.

My machine:

Ubuntu 19.04

Package versions:

numpy=1.18.1
numba=0.49.0
joblib=0.14.1

Problem: I want to parallelize code which internally calls functions that use the guvectorize decorator from numba. The error says that arguments of the function are not picklable; however, the arguments are numpy arrays, which are picklable.

Minimal Example:

import joblib

import numpy as np

from numba import guvectorize


@guvectorize(
    ["f8[:], f8[:]"], "(n) -> ()", nopython=True
)
def func(a, out):
    out_ = a.sum()
    out[0] = out_

n = 2
a_list = [np.random.randn(n) for _ in range(2)]

# does not work
with joblib.parallel_backend("loky", n_jobs=2):
    result = joblib.Parallel()(
        joblib.delayed(func)(a) for a in a_list
    )

Traceback:

_RemoteTraceback:
'''
Traceback (most recent call last):
  File "/home/tm/.local/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 391, in _process_worker
    call_item = call_queue.get(block=True, timeout=timeout)
  File "/usr/lib/python3.7/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
  File "/usr/lib/python3/dist-packages/numpy/core/__init__.py", line 149, in _ufunc_reconstruct
    return getattr(mod, name)
AttributeError: module '__main__' has no attribute 'func'
'''

The above exception was the direct cause of the following exception:

BrokenProcessPool                         Traceback (most recent call last)
<ipython-input-8-51ac1601ff38> in <module>
      1 with joblib.parallel_backend("loky", n_jobs=2):
      2     result = joblib.Parallel()(
----> 3         joblib.delayed(func)(a) for a in a_list
      4         )
      5

~/.local/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
    932
    933             with self._backend.retrieval_context():
--> 934                 self.retrieve()
    935             # Make sure that we get a last message telling us we are done
    936             elapsed_time = time.time() - self._start_time

~/.local/lib/python3.7/site-packages/joblib/parallel.py in retrieve(self)
    831             try:
    832                 if getattr(self._backend, 'supports_timeout', False):
--> 833                     self._output.extend(job.get(timeout=self.timeout))
    834                 else:
    835                     self._output.extend(job.get())

~/.local/lib/python3.7/site-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
    519         AsyncResults.get from multiprocessing."""
    520         try:
--> 521             return future.result(timeout=timeout)
    522         except LokyTimeoutError:
    523             raise TimeoutError()

/usr/lib/python3.7/concurrent/futures/_base.py in result(self, timeout)
    430                 raise CancelledError()
    431             elif self._state == FINISHED:
--> 432                 return self.__get_result()
    433             else:
    434                 raise TimeoutError()

/usr/lib/python3.7/concurrent/futures/_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

Issue Analytics

State:
Created 3 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

2reactions

ogriselcommented, Apr 28, 2021

As I said, above, just put the code of your function in a module that you import instead of putting its code in the main script or in an interactive jupyter session.

0reactions

lestevecommented, Jun 1, 2021

Let’s close as it seems more related to numba.

Top Results From Across the Web

Automatic parallelization with @jit

Some operations inside a user defined function, e.g. adding a scalar value to an array, are known to have parallel semantics. A user...

Numba, Numpy: Call @guvectorized function in a parallel ...

To answer my own question. In the Numba Gitter chat, someone pointed out: The issue is that guvectorize is generating a true NumPy...

Automatic parallelization with @jit - Numba documentation

Some operations inside a user defined function, e.g. adding a scalar value to an array, are known to have parallel semantics. A user...

Parallelism in Python* Using Numba*: The Fundamentals

The first requirement for using Numba is that your target code for JIT or LLVM compilation optimization must be enclosed inside a function....

Parallel Python with Numba and ParallelAccelerator

To use multiple cores in a Python program, there are three options. ... functions, or as inner functions inside a parallel @jit function....

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Parallelization fails when inner function uses numba guvectorize.

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Inconsistent joblib.Parallel behaviour between v0.12 and v0.14 when running from within an imported module?

Using Memory and Parallel with cached function defined inside Jupyter notebook results in not using the cache