bloscpack locked in multiprocessing mode
See original GitHub issueHere I share a mimicry of my problem where the code freezes in multiprocess mode and works just fine with single process. Below code tries to create a dummy array simulating my image files and tries to save them with multi processing. If you reduce number of processes one, it just works fine but with >1 processes it freezes.
I use Ubuntu 16.04 , Python 3.6 with Conda.
import os
import sys
import tempfile
import numpy as np
import bloscpack as bp
from tqdm import tqdm
from concurrent.futures import ProcessPoolExecutor, as_completed
def parallel_process(array, function, n_jobs=16, use_kwargs=False, front_num=3):
"""
A parallel version of the map function with a progress bar.
Args:
array (array-like): An array to iterate over.
function (function): A python function to apply to the elements of array
n_jobs (int, default=16): The number of cores to use
use_kwargs (boolean, default=False): Whether to consider the elements of array as dictionaries of
keyword arguments to function
front_num (int, default=3): The number of iterations to run serially before kicking off the parallel job.
Useful for catching bugs
Returns:
[function(array[0]), function(array[1]), ...]
"""
#We run the first few iterations serially to catch bugs
if front_num > 0:
front = [function(**a) if use_kwargs else function(a) for a in array[:front_num]]
#If we set n_jobs to 1, just run a list comprehension. This is useful for benchmarking and debugging.
if n_jobs==1:
return front + [function(**a) if use_kwargs else function(a) for a in tqdm(array[front_num:])]
#Assemble the workers
with ProcessPoolExecutor(max_workers=n_jobs) as pool:
#Pass the elements of array into function
if use_kwargs:
futures = [pool.submit(function, **a) for a in array[front_num:]]
else:
futures = [pool.submit(function, a) for a in array[front_num:]]
kwargs = {
'total': len(futures),
'unit': 'it',
'unit_scale': True,
'leave': True
}
#Print out the progress as tasks complete
for f in tqdm(as_completed(futures), **kwargs):
pass
out = []
#Get the results from the futures.
for i, future in tqdm(enumerate(futures)):
try:
out.append(future.result())
except Exception as e:
out.append(e)
return front + out
def dump_blosc(data, filename):
with open(filename, 'wb') as f:
f.write(bp.pack_ndarray_str(data))
def write_data(inputs):
dummy = np.random.rand(16,3,224,224).astype('uint8')
tf = tempfile.NamedTemporaryFile()
dump_blosc(dummy, tf.name)
if __name__ == '__main__':
parallel_process(range(100), write_data, n_jobs=2)
# for dir in dirs:
# print(dir)
# write_data(dir)
Issue Analytics
- State:
- Created 6 years ago
- Comments:24 (22 by maintainers)
Top Results From Across the Web
python multiprocessing lock issue - Stack Overflow
It uses a process pool initializer to set the manager dict as a global in each child process. FYI: Using a lock is...
Read more >blosc-1.20.1-bp153.1.19 - SUSE Package Hub -
- Update to 1.11.1 - Fixed a bug introduced in 1.11.0 and discovered by pandas test suite. This basically prevented to decompress buffers...
Read more >bloscpack - PyPI
Command line interface to and serialization format for Blosc, a high performance, multi-threaded, blocking and shuffling compressor. Uses python-blosc bindings ...
Read more >Changelog for Blosc 1.16.2 - ABI laboratory
So as to select the split mode, a new API function has been introduced: ... See https://github.com/Blosc/bloscpack/issues/50. Changes from 1.9.3 to 1.10.0 ...
Read more >Use a Lock in the Multiprocessing Pool - Super Fast Python
Now that we know how to share a multiprocessing.Lock with child worker processes in the process pool, let's look at some worked examples....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Also, the following code seems to work:
That’s interesting. As always, PR are welcome and all that jazz. Thanks for pointing this out!