Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Program hangs when instantiating a GP using multiprocessing

See original GitHub issue

I’m trying to do what seems like a simple task: use multiprocessing to parallelize the optimize() call over many unique GPs. Here’s a minimal example of what I’m trying to do.

from GPy.core.gp import GP
from GPy.kern import White
from GPy.likelihoods.gaussian import Gaussian
from GPy.inference.latent_function_inference.laplace import Laplace
from multiprocessing import Pool
import numpy as np

# Wrapper needed so the function is pickleable, which is required for multiprocessing.Pool
def opt_wrapper(gp):
   return gp.optimize() # Can replace with 'return 1' and program still hangs

size = 100 # Program works when this is low enough
inference_method = Laplace() # Program works when this is None
models = [GP(X=np.arange(size).reshape(size,1), Y=np.arange(size).reshape(size,1), kernel=White(1), likelihood=Gaussian(), inference_method=inference_method) for _ in range(1)]

print "Starting pool..."
pool = Pool(1)
print pool.map(opt_wrapper, models)
pool.close()
pool.join()

The program simply hangs after printing “Starting pool…” Annoyingly, it also results in a zombie process for each worker in the pool (just 1 in this example).

The program works just fine when size is less than about 60. However, when size is larger, it simply hangs after printing “Starting pool…” Note you can replace pool.map with the built-in map and it works just fine, so it seems to be an issue of creating a GP with Laplace over a certain size within a new process.

The program works just fine when any one of the following conditions are true:

When size is less than about 60. For larger values, it hangs.
When Laplace() is replaced with None. This is because the Gaussian likelihood then defaults to ExactGaussianInference(); however, my actual project’s likelihood is custom and requires Laplace().
When pool.map is replaced with the built-in map.

Lastly, it still breaks when you replace return gp.optimize() with return 1. Similarly, the following program hangs (same imports):

def make_gp(dummy):
   inference_method = Laplace() # Again, program works when Laplace() becomes None
   gp = GP(X=np.arange(size).reshape(size,1), Y=np.arange(size).reshape(size,1), kernel=White(1), likelihood=Gaussian(), inference_method=inference_method)
   return 1

size = 100 # Again, program works when this is small
pool = Pool(1)
print pool.map(make_gp, ['dummy']) # Again, works with `map`
pool.close()
pool.join()

It seems to be an issue of instantiating/copying a GP–both with Laplace and above a certain size–within a new process. Seems highly odd and highly specific. Any help greatly appreciated.

Issue Analytics

State:
Created 6 years ago
Reactions:1
Comments:11 (3 by maintainers)

Top GitHub Comments

1reaction

brendenpetersencommented, Mar 19, 2018

Hi @ahartikainen, I was not using Jupyter Notebook. Python was executed from command-line. So I don’t think those changes would fix the problem.

I’ve moved on from this project, but the issue was actually a limitation with Python’s multiprocessing, which uses OS pipes under the hood and is therefore limited by buffer sizes. This explains why the program works when size is small enough, as it puts it under the buffer size, and why it worked for @mzwiessele, whose OS likely had a different buffer size.

One (of many) explanations here: https://sopython.com/canon/82/programs-using-multiprocessing-hang-deadlock-and-never-complete/

0reactions

patel-zeelcommented, Feb 16, 2021

I had a similar problem while doing benchmarking on AMD and Intel CPUs. AMD was too bad with GPy multiprocessing but Intel did well. Does anyone have a similar experience?