make joblib.Parallel return a generator
See original GitHub issueOften one wants to perform simple operations on the output of a very long sequence of tasks. If the number of outputs is large, it may be inefficient or impossible to store them in a list. Instead, add functionality to joblib.Parallel so that one can do:
parallel_job = ( delayed( job )( param ) for param in so_many_job_params ) # generator for input
for output in Parallel(n_jobs=10, iterable=parallel_job): # generator as output
do_something( output )
In the example above, I’ve added the job iterable to the constructor of Parallel. The only required change would be to add an __iter__(self)
method to Parallel which has almost identical functionality to __call__(self.iterable)
, but instead uses self.iterable
and yield
s an element one completed job at a time, rather than return
ing a list of outputs.
Issue Analytics
- State:
- Created 8 years ago
- Reactions:15
- Comments:11 (4 by maintainers)
Top Results From Across the Web
How to return a generator using joblib.Parallel()?
Step 1) def aNextNUM( aNum = 0 ): yield aNum + 1 Step 2) assign results returned from the N_jobs -many spawned joblib.Parallel...
Read more >Embarrassingly parallel for loops - Joblib - Read the Docs
Joblib provides a simple helper class to write parallel for loops using multiprocessing. The core idea is to write the code to be...
Read more >A Parallel loop in Python with Joblib.Parallel
parallel loop in python using joblib and parallel. ... we create a random number generator and generate an array of 1000 random integers....
Read more >A Parallel loop in Python with Joblib.Parallel
from joblib import Parallel, delayed from numba import jit import ... we create a random number generator and generate an array of 1000 ......
Read more >MPIRE for Python: MultiProcessing Is Really Easy
These communication primitives don't only make multiprocessing ... a lazy version of map which returns a generator, if we want to show ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I would love this functionality to be added - if not just so you could wrap
Parallel
with a progress bar. I see there are a bunch of closed PRs trying to implement this, but none of them have been merged for some reason 😕In the meantime, if anyone wants a quick drop-in (albeit hacky) solution, this has been working for me. It works without having to copy and make deep edits to the original code.
I coded this in a library called
pypeln
, however it uses multiprocessing and some libraries on OSX crash because of this so I have to fallback to joblib for that OS for some work I am doing.