question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parallelization fails when inner function uses numba guvectorize.

See original GitHub issue

EDIT: Removed environment variables and target="parallel" in guvectorize to make the example more minimal.

My machine:

  • Ubuntu 19.04

Package versions:

  • numpy=1.18.1
  • numba=0.49.0
  • joblib=0.14.1

Problem: I want to parallelize code which internally calls functions that use the guvectorize decorator from numba. The error says that arguments of the function are not picklable; however, the arguments are numpy arrays, which are picklable.

Minimal Example:

import joblib

import numpy as np

from numba import guvectorize


@guvectorize(
    ["f8[:], f8[:]"], "(n) -> ()", nopython=True
)
def func(a, out):
    out_ = a.sum()
    out[0] = out_

n = 2
a_list = [np.random.randn(n) for _ in range(2)]

# does not work
with joblib.parallel_backend("loky", n_jobs=2):
    result = joblib.Parallel()(
        joblib.delayed(func)(a) for a in a_list
    )

Traceback:

_RemoteTraceback:
'''
Traceback (most recent call last):
  File "/home/tm/.local/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 391, in _process_worker
    call_item = call_queue.get(block=True, timeout=timeout)
  File "/usr/lib/python3.7/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
  File "/usr/lib/python3/dist-packages/numpy/core/__init__.py", line 149, in _ufunc_reconstruct
    return getattr(mod, name)
AttributeError: module '__main__' has no attribute 'func'
'''

The above exception was the direct cause of the following exception:

BrokenProcessPool                         Traceback (most recent call last)
<ipython-input-8-51ac1601ff38> in <module>
      1 with joblib.parallel_backend("loky", n_jobs=2):
      2     result = joblib.Parallel()(
----> 3         joblib.delayed(func)(a) for a in a_list
      4         )
      5

~/.local/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
    932
    933             with self._backend.retrieval_context():
--> 934                 self.retrieve()
    935             # Make sure that we get a last message telling us we are done
    936             elapsed_time = time.time() - self._start_time

~/.local/lib/python3.7/site-packages/joblib/parallel.py in retrieve(self)
    831             try:
    832                 if getattr(self._backend, 'supports_timeout', False):
--> 833                     self._output.extend(job.get(timeout=self.timeout))
    834                 else:
    835                     self._output.extend(job.get())

~/.local/lib/python3.7/site-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
    519         AsyncResults.get from multiprocessing."""
    520         try:
--> 521             return future.result(timeout=timeout)
    522         except LokyTimeoutError:
    523             raise TimeoutError()

/usr/lib/python3.7/concurrent/futures/_base.py in result(self, timeout)
    430                 raise CancelledError()
    431             elif self._state == FINISHED:
--> 432                 return self.__get_result()
    433             else:
    434                 raise TimeoutError()

/usr/lib/python3.7/concurrent/futures/_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
ogriselcommented, Apr 28, 2021

As I said, above, just put the code of your function in a module that you import instead of putting its code in the main script or in an interactive jupyter session.

0reactions
lestevecommented, Jun 1, 2021

Let’s close as it seems more related to numba.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Automatic parallelization with @jit
Some operations inside a user defined function, e.g. adding a scalar value to an array, are known to have parallel semantics. A user...
Read more >
Numba, Numpy: Call @guvectorized function in a parallel ...
To answer my own question. In the Numba Gitter chat, someone pointed out: The issue is that guvectorize is generating a true NumPy...
Read more >
Automatic parallelization with @jit - Numba documentation
Some operations inside a user defined function, e.g. adding a scalar value to an array, are known to have parallel semantics. A user...
Read more >
Parallelism in Python* Using Numba*: The Fundamentals
The first requirement for using Numba is that your target code for JIT or LLVM compilation optimization must be enclosed inside a function....
Read more >
Parallel Python with Numba and ParallelAccelerator
To use multiple cores in a Python program, there are three options. ... functions, or as inner functions inside a parallel @jit function....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found