How to yield result instead of getting the result list?
See original GitHub issueThis is my code
import time
from joblib import Parallel, delayed
def producer():
with open("some_large_file") as f:
for line in f:
yield line
def func(i):
time.sleep(1) # or other time comsumption operations
return i
out = Parallel(n_jobs=10, verbose=100)(delayed(func)(i) for i in producer())
print(out)
My purpose is to read a large file (about 10G),and do some operations on each line.
The final result will be saved in the out
variable, which is a list object stored in memory. Can joblib yield immediate results during run jobs? Then I can write the result content to a file rather than store them in the memory?
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (1 by maintainers)
Top Results From Across the Web
How to Pythonically yield all values from a list? - Stack Overflow
Since this question doesn't specify; I'll provide an answer that applies in Python >= 3.3. If you need only to return that list,...
Read more >Yield in Python Tutorial: Generator & Yield vs Return Example
Python yield returns a generator object. Generators are special functions that have to be iterated to get the values. The yield keyword converts ......
Read more >When to use yield instead of return in Python? - GeeksforGeeks
Return sends a specified value back to its caller whereas Yield can produce a sequence of values. We should use yield when we...
Read more >Understanding Python's "yield" Keyword - Stack Abuse
The yield keyword in Python is used to create generators. A generator is a type of collection that produces items on-the-fly and can...
Read more >How to Use Generators and yield in Python
In this step-by-step tutorial, you'll learn about generators and yielding in Python. You'll create generator functions and generator expressions using ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This feature is quite challenging to code (danger of race conditions, in particular when dealing with exceptions).
There is a pull request in progress on this feature: https://github.com/joblib/joblib/pull/588 We hope to merge in soonish, but these things are tricky.
Second this feature request. Often my parallel jobs have large outputs that I want to process and write to disk as they become available rather than keep them around in memory until all jobs have completed. I would use something like
multiprocessing.Pool.imap
for this but I need the advanced pickling and memmap conversion of joblib.