Why does Dynesty multiprocessing with ThreadPoolExecutor not use all cores?
See original GitHub issueI am trying to run a simple example with nested sampling and dynesty
.
I installed dynesty
from github: https://github.com/joshspeagle/dynesty
Computer Setup
OS: Mac OSX El Capitan (10.11.6)
CPU: 8 cores
RAM: 16.0 GB
gcc: 4.8.5 via conda install gcc
Problem Setup
I ran the code below (simulate data, setup prior/likelihood, submit to dynesty
.
To establish multiprocessing, I used multiprocessing.Pool
, concurrent.futures.ProcessPoolExecutor
, and concurrent.futures.ThreadPoolExecutor
.
I tried the code in Jupyter Lab
, ipython
, and (script) python run_dynesty_test.py
.
Problem: The entire script runs fine; but dynesty/python starts using all of the cores; then it slowly starts to use less and less cores. Finally, after about 5 minutes, dynesty/python uses almost exactly 1 core.
Evidence: htop
starts reading 780%, then 550%, then 350%, then 100% CPU – and it stays at 100% CPU for the rest of the operation; except a once every other minute, htop
will read ~250-300% CPU.
Question
Why is dynesty/ThreadPoolExecutor/python/etc not using all of my cores all of the time?
Code Snippet Involving Dynesty and Multiprocessing
with ThreadPoolExecutor(max_workers=cpu_count()-1) as executor:
sampler = dynesty.DynamicNestedSampler(
loglike,
prior,
ndim=ndims,
nparam=ndims,
bound='multi',
sample='unit',
pool=executor,
queue_size=cpu_count())
sampler.run_nested(nlive_init=100,nlive_batch=100)
res = sampler.results
Full Script to Setup Test
from __future__ import absolute_import,\
unicode_literals, print_function
from multiprocessing import set_start_method
set_start_method('forkserver')
import dynesty
import math
import os
import threading, subprocess
from sys import platform
from numpy import pi, sin, cos, linspace
from pylab import *#;ion()
from multiprocessing import Pool, cpu_count
if not os.path.exists("chains"): os.mkdir("chains")
# plotting
import matplotlib
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def gaussian1Dp(cube):
center = cube[0]
width = cube[1]
height = cube[2]
return lambda y: height*np.exp(-0.5*(( (center - y) / width)**2))
np.random.seed(42)
param0a= -0.5
param0b= 0.5
param1a= 0.1
param1b= 0.1
param2a= 0.8
param2b= 0.8
yunc = 0.1
nPts = int(100)
nThPts= int(1e3)
xmin = -1
xmax = 1
dx = 0.1*(xmax - xmin)
yuncs = np.random.normal(yunc, 1e-2 * yunc, nPts)
thdata= np.linspace(xmin-dx, xmax+dx, nThPts)
xdata = np.linspace(xmin,xmax,nPts)
ydata = model([param0a,param1a,param2a])(xdata) \
+ model([param0b,param1b,param2b])(xdata)
yerr = np.random.normal(0, yuncs, nPts)
zdata = ydata + yerr
figure(figsize=(10,10))
plot(thdata, model([param0a,param1a,param2a])(thdata) \
+ model([param0b,param1b,param2b])(thdata))
errorbar(xdata, zdata, yunc*ones(zdata.size), fmt='o')
show()
def prior(cube):
cube[0] = cube[0]*2 - 1
cube[1] = cube[1]*2
cube[2] = cube[2]*2
return cube
def loglike(cube):
modelNow = model(cube)(xdata)
return -0.5*((modelNow - ydata)**2./yuncs**2.).sum()
from concurrent.futures import ThreadPoolExecutor,\
ProcessPoolExecutor
if __name__ == '__main__':
if not os.path.exists("chains"): os.mkdir("chains")
n_params = len(parameters)
ndims = 3
with ThreadPoolExecutor(max_workers=cpu_count()-1) as executor:
sampler = dynesty.DynamicNestedSampler(
loglike,
prior,
ndim=ndims,
nparam=ndims,
bound='multi',
sample='unit',
pool=executor,
queue_size=cpu_count())
sampler.run_nested(nlive_init=100, nlive_batch=100)
res = sampler.results
from dynesty import plotting as dyplot
# evidence check
fig, axes = dyplot.runplot(res, color='red',
lnz_truth=lnz_truth,
truth_color='black',
logplot=True)
fig.tight_layout()
joblib.dump(res,'dynesty_double_gaussian_test_results.joblib.save')
Issue Analytics
- State:
- Created 5 years ago
- Reactions:2
- Comments:8 (5 by maintainers)
Yep, can confirm that
dynesty
is using all the cores on my machine. The variation in behavior is due to overhead in-between allocating batches (which requires operations done in serial), but during a single batch I can get up to full usage of the cores. My guess is you’re seeing ~50% usage overall just because evaluating the likelihood is essentially instantaneous so the overhead is taking up a significant portion of the runtime.The first thing to say is that I don’t know what virtual cores are. But I’m using
OSX 10.14 (Mojave) Macbook Pro Mid2014 2.5 GHz Intel Core i7 gcc version 4.8.5
multiprocessing.cpu_count()
registers 8 cores; I think that is 4 cores with 2 threads each; but this is where I get confused.I also tried it on 2 linux servers that I have access to:
RHEL6 x86_64 Kernel: 2.6.32-696.23.1.el6.x86_64 CPUs: 12 and 24 gcc 4.4.7
TL;DR In all 3 cases (Mac + 2 Linux)
dynesty
seems to use almost exactly half of my cpus (50% +\- 5%). It oscillated so consistently around 50% that it feels as though it were programmed to use exactly 50% full CPU usage, with room for statistical uncertainty. Sometimes it grows to 70% for < 1 minutes (~30 seconds); but falls back down.With other software (i.e.
tensorflow
) stack overflow level information informed me that “some problems are not worth all of your CPUs andtensorflow
decides that on the fly”. Is it possible that themultiprocessing
library is not requesting all of the cpus? At the same time, you said that you tried my code above and it used all of your cpus.