Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

KMeans(init='k-means++') performance issue with OpenBLAS

See original GitHub issue

I open this issue to investigate a performance problem that might be related to #17230.

I adapted the reproducer of #17230 to display more info and make it work on a medium-size ranom dataset.

from sklearn import cluster
from time import time
from pprint import pprint
from threadpoolctl import threadpool_info
import numpy as np


pprint(threadpool_info())
rng = np.random.RandomState(0)
data = rng.randn(5000, 50)
t0_global = time()
for k in range(1, 15):
    t0 = time()
    # print(f"Running k-means with k={k}: ", end="", flush=True)
    cluster.KMeans(
        n_clusters=k,
        random_state=42,
        n_init=10,
        max_iter=2000,
        algorithm='full',
        init='k-means++').fit(data)
    # print(f"{time() - t0:.3f} s")

print(f"Total duration: {time() - t0_global:.3f} s")

I tried to run this on Linux with scikit-learn master (therefore including the #16499 fix) with 2 different builds of scipy (with openblas from pypi and MKL from anaconda) and various values for OMP_NUM_THREADS (unset, OMP_NUM_THREADS=1, OMP_NUM_THREADS=2, OMP_NUM_THREADS=4) on a laptop with 2 physical cpu cores (4 logical cpus).

In both cases, I use the same scikit-learn binaries (built with GCC in editable mode). I just change the env.

The summary is:

with MKL there is not problem: large or unset values of OMP_NUM_THREADS are faster than OMP_NUM_THREADS=1;
with OpenBLAS without explicit setting of OMP_NUM_THREADS or setting a large value for it is significanlty slower forced sequential run with OMP_NUM_THREADS=1.

I will include my runs in the first comment.

/cc @jeremiedbb

Issue Analytics

State:
Created 3 years ago
Comments:11 (11 by maintainers)

Top GitHub Comments

1reaction

jeremiedbbcommented, May 25, 2020

I think so but it might be more specific. In k-means++ the matrix multiplication is (n_candidates, n_features) x (n_samples, n_features) and the number of candidates is a small number (~log(n_clusters)). It’s possible that mkl is better at optimizing the matrix multiplication where 1 dimension is much smaller than the other one.

1reaction

jeremiedbbcommented, May 25, 2020

Actually it’s k-means++ init faults ! it scales poorly, especially using hyperthreads. The timings are back to expected when you set the init to ‘random’, meaning that the openmp loop and the inner blas are properly controlled.

Maybe you can turn this issue into a k-means++ issue ?