Oversubscription in HistGradientBoosting with pytest-xdist
See original GitHub issueWhen running tests with pytest-xdist on a machine with 12 (physical) CPU machine, the use of OpenMP in HistGradientBoosting seem to lead to significant over-subscription,
pytest sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py -v
for me takes 0.85s. This runs 2 docstrings on training GBDT classifier and regressor on iris and boston datasets respectively.
- Running thin on 2 parallel processes with (
-n 2
) takes 56s (and 50 threads are created). - Running with 2 processes and
OMP_NUM_THREADS=2
takes 0.52s
While I understand the case of catastrophic oversubscription when N_CPU_THREADS**2
threads are created on a machine with many cores, here we create 2*N_CPU_THREADS
only as compared to 1*N_CPU_THREADS
and get a 10x slowdown.
Can someone reproduce it? Here using scikit-learn master, and a conda env on Linux with latest numpy scipy nomkl python=3.7
.
Because pytest-xdist uses its own parallelism system (not sure what it does exactly) I guess this won’t be addressed by threadpoolctl https://github.com/scikit-learn/scikit-learn/issues/14979?
Edit: Originally reported in https://github.com/tomMoral/loky/issues/224
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (11 by maintainers)
The fact that is so catastrophic even on a small number of cores is intriguing though. @jeremiedbb @NicolasHug maybe you have an idea why this is happening more specifically for HistGradientBoostingClassifier/Regressor?
I wonder why we don’t have a similarly scaled over-subscription problem with MKL or OpenBLAS thread pools.
There is something weird. On my laptop (2 cores, 4 hyperthreads):
pytest -v sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py
=> 0.70s (no xdist)pytest -v -n 1 sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py
=> 1.46s (1 xdist worker)pytest -v -n 2 sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py
=> 11s to 48s (2 xdist workers)OMP_NUM_THREADS=2 pytest -v -n 2 sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py
=> 1.15sOMP_NUM_THREADS=4 pytest -v -n 2 sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py
=> between 7.8s and 34sSo this seems to be a really extreme case of over-subscription.