lmplot() uses all server CPUs - any method to limit cores?
See original GitHub issueWhen executing sns.lmplot()
on a DataFrame with 500k rows and with an x-axis aggregation function (np.mean()
) seaborn uses all available CPU cores on a 4-CPU server machine (with XXX cores available). We run python in Docker containers without any CPU limits imposed by the orchestrating platform (Openshift). This used to work, because data munging and plotting algos used to be single-threaded, and statistical modeling algos always give us the ability to limit the number of threads (to optimize performance at around 8-32 cores, depending on the algo). We need such a setting in seaborn as well (probably needs to be passed to the underying statistical functions).
The exact code to replicate (on a multi-core machine and using a DataFrame filled with two columns and 500k rows of random numbers):
import seaborn as sns
sns.lmplot(x='IntVar1", y='FloatVar1', data=df, x_estimator=np.mean, x_bins=10)
Python 3.7. Package versions:
- seaborn: 0.10.0
- matplotlib: 3.1.2
- numpy: 1.18.1
- pandas: 1.0.1
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
I have found a hack (see #347) to speed things up from minutes to seconds by switching off lmplot’s computationally intensive bootstrapped confidence intervals (
ci=None
), but this is just masking the core issue of lmplot using all CPU threads without any method to limit such heavy usage.No idea what
adfuller
is, @GF-Huang; pretty sure you’re asking in the wrong channel.