question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory Leak in joblib.Parallel

See original GitHub issue

Apologies if this issue has been flagged before, I didn’t see anything about it though. I’ve been running the following bit of code:

def compute(margin, N_wl, pi, alpha, n): #start, end):

    ret_val = 0
    #for n in range(start, end):
    p1 = 0.5 + margin/2
    q_alpha = hypergeom.ppf(1-alpha, N_wl, N_wl/2, n) # upper alpha quantile of null distribution
    prob_select_N = binom.pmf(n, N_wl, pi) # probability of selecting n out of N_wl at sampling rate pi
    pvalue_nw = hypergeom.sf(q_alpha, N_wl, N_wl*p1, n) # probability of alternative distr falling above q_alpha
    return prob_select_N*pvalue_nw

def compute_unconditional_power(margin, N_wl, pi, alpha):
    '''
    Compute unconditional power of the test.

    margin = vote margin (votes for w / votes for w or l) in the population
    N_wl = the total number of ballots for either the winner or loser in the population,
    pop = total population size,
    pi = the sampling probability,
    alpha = the type I error rate
    '''
    unlikely_draw_lower = binom.ppf(0.005, N_wl, pi)
    unlikely_draw_upper = binom.ppf(0.995, N_wl, pi)
    power_sum = 0

    powers = Parallel(n_jobs=num_cores)(delayed(compute)(margin, N_wl, pi, alpha, n) \
            for n in range(int(unlikely_draw_lower), int(unlikely_draw_upper)))

    return sum(powers)

And i’ve noticed two things: the way that the parallel processes get spun up in 0.12.1 is different than in 0.11, and that this code, which works fine in 0.11, results in a memory leak in 0.12.1. Typically I get the following error, which as far as I can tell is just joblib’s way of handling an OOM:

/usr/local/lib/python2.7/dist-packages/joblib/externals/loky/process_executor.py:634:  UserWarning: A worker timeout while some jobs were given to the executor. You might want to use a longer timeout for the executor. 
  "the executor.", UserWarning
Traceback (most recent call last):
  File "gen_plot_data.py", line 214, in <module>
    main()
  File "gen_plot_data.py", line 165, in main
    bbp_ss = get_bbp_sample_size(prop_winner, Ntot, alpha)
  File "gen_plot_data.py", line 115, in get_bbp_sample_size
    quants[quant] = get_sample_for_power(margin, Ntot, alpha, quant/100.0, 1/float(Ntot))
  File "gen_plot_data.py", line 106, in get_sample_for_power
    x = compute_unconditional_power(margin, Ntot, pi, alpha)
  File "gen_plot_data.py", line 93, in compute_unconditional_power
    for n in range(int(unlikely_draw_lower), int(unlikely_draw_upper)))
  File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 962, in __call__
    self.retrieve()
  File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 865, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/usr/local/lib/python2.7/dist-packages/joblib/_parallel_backends.py", line 515, in wrap_future_result
    return future.result(timeout=timeout)
  File "/usr/local/lib/python2.7/dist-packages/joblib/externals/loky/_base.py", line 431, in result
    return self.__get_result()
  File "/usr/local/lib/python2.7/dist-packages/joblib/externals/loky/_base.py", line 382, in __get_result
    raise self._exception
joblib.externals.loky.process_executor.BrokenProcessPool: A process in the executor was terminated abruptly while the future was running or pending.```

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:33 (22 by maintainers)

github_iconTop GitHub Comments

6reactions
YubinXiecommented, Jun 3, 2019

python 3.6 joblib version 0.12.5 I am having the following issues but the script was able to finish working (I have 10 workers).

/Users/xi/miniconda3/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py:700: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
  "timeout or by a memory leak.", UserWarning
/Users/xi/miniconda3/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py:700: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
  "timeout or by a memory leak.", UserWarning
/Users/xi/miniconda3/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py:700: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
  "timeout or by a memory leak.", UserWarning
/Users/xi/miniconda3/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py:700: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
  "timeout or by a memory leak.", UserWarning

Does this information mean the calculation in my code is wrong? I am not sure if I can believe the results… When I run them independently, I get the same results. So it seems the results are solid. But the warning still worries me…

1reaction
yairdaoncommented, Aug 30, 2018

I had a similar problem and following @ogrisel 's suggestion above, forcing garbage collection via gc.collect() inside the inner function (the one I was calling in parallel) seems to have resolved it. Thanks.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Joblib memory usage keeps growing - Stack Overflow
It seem this memory leak issue has been resolved on the last version of Joblib. They introduce loky backend as memory leaks safeguards....
Read more >
How to use joblib.Memory - Read the Docs
joblib. Memory enables to cache results from a function into a specific location.
Read more >
Memory release after joblib.Parallel [python]
Stuck with the issue with memory consumption - after running joblib's Parallel, deleting results and gc.collect() -ing I still have ...
Read more >
joblib Documentation - Read the Docs
from joblib import Parallel, delayed ... For even finer tuning of the memory usage it is also possible to dump the array as...
Read more >
joblib - Bountysource
@memory.cache(expires_after=60) def f(x): . ... I'm trying to update a numpy array using joblib Parallel ... "timeout or by a memory leak.", UserWarning...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found