Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Oversubscription in HistGradientBoosting with pytest-xdist

See original GitHub issue

When running tests with pytest-xdist on a machine with 12 (physical) CPU machine, the use of OpenMP in HistGradientBoosting seem to lead to significant over-subscription,

pytest sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py  -v

for me takes 0.85s. This runs 2 docstrings on training GBDT classifier and regressor on iris and boston datasets respectively.

Running thin on 2 parallel processes with (-n 2) takes 56s (and 50 threads are created).
Running with 2 processes and OMP_NUM_THREADS=2 takes 0.52s

While I understand the case of catastrophic oversubscription when N_CPU_THREADS**2 threads are created on a machine with many cores, here we create 2*N_CPU_THREADS only as compared to 1*N_CPU_THREADS and get a 10x slowdown.

Can someone reproduce it? Here using scikit-learn master, and a conda env on Linux with latest numpy scipy nomkl python=3.7.

Because pytest-xdist uses its own parallelism system (not sure what it does exactly) I guess this won’t be addressed by threadpoolctl https://github.com/scikit-learn/scikit-learn/issues/14979?

Edit: Originally reported in https://github.com/tomMoral/loky/issues/224

Issue Analytics

State:
Created 4 years ago
Comments:11 (11 by maintainers)

Top GitHub Comments

1reaction

ogriselcommented, Oct 2, 2019

The fact that is so catastrophic even on a small number of cores is intriguing though. @jeremiedbb @NicolasHug maybe you have an idea why this is happening more specifically for HistGradientBoostingClassifier/Regressor?

I wonder why we don’t have a similarly scaled over-subscription problem with MKL or OpenBLAS thread pools.

1reaction

ogriselcommented, Oct 2, 2019

There is something weird. On my laptop (2 cores, 4 hyperthreads):

pytest -v sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py => 0.70s (no xdist)
pytest -v -n 1 sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py => 1.46s (1 xdist worker)
pytest -v -n 2 sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py => 11s to 48s (2 xdist workers)
OMP_NUM_THREADS=2 pytest -v -n 2 sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py => 1.15s
OMP_NUM_THREADS=4 pytest -v -n 2 sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py => between 7.8s and 34s

So this seems to be a really extreme case of over-subscription.