question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

make joblib.Parallel return a generator

See original GitHub issue

Often one wants to perform simple operations on the output of a very long sequence of tasks. If the number of outputs is large, it may be inefficient or impossible to store them in a list. Instead, add functionality to joblib.Parallel so that one can do:

parallel_job = ( delayed( job )( param ) for param in so_many_job_params )  # generator for input
for output in Parallel(n_jobs=10, iterable=parallel_job):                  # generator as output 
   do_something( output )

In the example above, I’ve added the job iterable to the constructor of Parallel. The only required change would be to add an __iter__(self) method to Parallel which has almost identical functionality to __call__(self.iterable), but instead uses self.iterable and yields an element one completed job at a time, rather than returning a list of outputs.

Issue Analytics

  • State:open
  • Created 8 years ago
  • Reactions:15
  • Comments:11 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
beasteerscommented, Jun 19, 2020

I would love this functionality to be added - if not just so you could wrap Parallel with a progress bar. I see there are a bunch of closed PRs trying to implement this, but none of them have been merged for some reason 😕

In the meantime, if anyone wants a quick drop-in (albeit hacky) solution, this has been working for me. It works without having to copy and make deep edits to the original code.

import joblib
from joblib import delayed
import threading

class Parallel(joblib.Parallel):
    def it(self, iterable):
        try:
            t = threading.Thread(target=self.__call__, args=(iterable,))
            t.start()

            i = 0
            output = self._output
            while t.is_alive() or (output and i < len(output)):
                # catch the list reference and store before it's overwritten
                if output is None:
                    output = self._output
                # yield when a new item appears
                if output and i < len(output):
                    yield output[i]
                    i += 1
        finally:
            t.join()


########
# usage
########

import time
import tqdm

def task(x):
    time.sleep(1)
    return x * 2

xs = range(12)
it = Parallel(3).it(delayed(task)(x) for x in xs)
pbar = tqdm.tqdm(it, total=len(xs))
for x in pbar: 
    pbar.write(str(x))
1reaction
cgarciaecommented, May 28, 2019

I coded this in a library called pypeln, however it uses multiprocessing and some libraries on OSX crash because of this so I have to fallback to joblib for that OS for some work I am doing.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to return a generator using joblib.Parallel()?
Step 1) def aNextNUM( aNum = 0 ): yield aNum + 1 Step 2) assign results returned from the N_jobs -many spawned joblib.Parallel...
Read more >
Embarrassingly parallel for loops - Joblib - Read the Docs
Joblib provides a simple helper class to write parallel for loops using multiprocessing. The core idea is to write the code to be...
Read more >
A Parallel loop in Python with Joblib.Parallel
parallel loop in python using joblib and parallel. ... we create a random number generator and generate an array of 1000 random integers....
Read more >
A Parallel loop in Python with Joblib.Parallel
from joblib import Parallel, delayed from numba import jit import ... we create a random number generator and generate an array of 1000 ......
Read more >
MPIRE for Python: MultiProcessing Is Really Easy
These communication primitives don't only make multiprocessing ... a lazy version of map which returns a generator, if we want to show ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found