question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

bloscpack locked in multiprocessing mode

See original GitHub issue

Here I share a mimicry of my problem where the code freezes in multiprocess mode and works just fine with single process. Below code tries to create a dummy array simulating my image files and tries to save them with multi processing. If you reduce number of processes one, it just works fine but with >1 processes it freezes.

I use Ubuntu 16.04 , Python 3.6 with Conda.


import os
import sys
import tempfile

import numpy as np
import bloscpack as bp

from tqdm import tqdm
from concurrent.futures import ProcessPoolExecutor, as_completed


def parallel_process(array, function, n_jobs=16, use_kwargs=False, front_num=3):
    """
        A parallel version of the map function with a progress bar. 

        Args:
            array (array-like): An array to iterate over.
            function (function): A python function to apply to the elements of array
            n_jobs (int, default=16): The number of cores to use
            use_kwargs (boolean, default=False): Whether to consider the elements of array as dictionaries of 
                keyword arguments to function 
            front_num (int, default=3): The number of iterations to run serially before kicking off the parallel job. 
                Useful for catching bugs
        Returns:
            [function(array[0]), function(array[1]), ...]
    """
    #We run the first few iterations serially to catch bugs
    if front_num > 0:
        front = [function(**a) if use_kwargs else function(a) for a in array[:front_num]]
    #If we set n_jobs to 1, just run a list comprehension. This is useful for benchmarking and debugging.
    if n_jobs==1:
        return front + [function(**a) if use_kwargs else function(a) for a in tqdm(array[front_num:])]
    #Assemble the workers
    with ProcessPoolExecutor(max_workers=n_jobs) as pool:
        #Pass the elements of array into function
        if use_kwargs:
            futures = [pool.submit(function, **a) for a in array[front_num:]]
        else:
            futures = [pool.submit(function, a) for a in array[front_num:]]
        kwargs = {
            'total': len(futures),
            'unit': 'it',
            'unit_scale': True,
            'leave': True
        }
        #Print out the progress as tasks complete
        for f in tqdm(as_completed(futures), **kwargs):
            pass
    out = []
    #Get the results from the futures. 
    for i, future in tqdm(enumerate(futures)):
        try:
            out.append(future.result())
        except Exception as e:
            out.append(e)
    return front + out


def dump_blosc(data, filename):
    with open(filename, 'wb') as f:
        f.write(bp.pack_ndarray_str(data))


def write_data(inputs):
    dummy = np.random.rand(16,3,224,224).astype('uint8')
    tf = tempfile.NamedTemporaryFile()
    dump_blosc(dummy, tf.name)


if __name__ == '__main__':
    parallel_process(range(100), write_data, n_jobs=2)
    # for dir in dirs:
    #     print(dir)
    #     write_data(dir)

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:24 (22 by maintainers)

github_iconTop GitHub Comments

1reaction
esccommented, Jun 9, 2017

Also, the following code seems to work:

    with ProcessPoolExecutor(max_workers=4) as e:
        res = list(e.map(write_data, tqdm(range(100000000))))
0reactions
FrancescAltedcommented, Jun 16, 2017

That’s interesting. As always, PR are welcome and all that jazz. Thanks for pointing this out!

Read more comments on GitHub >

github_iconTop Results From Across the Web

python multiprocessing lock issue - Stack Overflow
It uses a process pool initializer to set the manager dict as a global in each child process. FYI: Using a lock is...
Read more >
blosc-1.20.1-bp153.1.19 - SUSE Package Hub -
- Update to 1.11.1 - Fixed a bug introduced in 1.11.0 and discovered by pandas test suite. This basically prevented to decompress buffers...
Read more >
bloscpack - PyPI
Command line interface to and serialization format for Blosc, a high performance, multi-threaded, blocking and shuffling compressor. Uses python-blosc bindings ...
Read more >
Changelog for Blosc 1.16.2 - ABI laboratory
So as to select the split mode, a new API function has been introduced: ... See https://github.com/Blosc/bloscpack/issues/50. Changes from 1.9.3 to 1.10.0 ...
Read more >
Use a Lock in the Multiprocessing Pool - Super Fast Python
Now that we know how to share a multiprocessing.Lock with child worker processes in the process pool, let's look at some worked examples....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found