Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

sklearn.cluster.KMeans 0.23 is extra slower compared to 0.22.2

See original GitHub issue

Used code:

from sklearn import cluster

for k in range(1,15):
     cluster.KMeans(
           n_clusters   = k,           
           random_state = 42,      
           n_init       = 10,
           max_iter     = 2000,
           algorithm    = 'full',
           init         = 'k-means++'   )

Expected Results

Computation in v0.22.2 was done in 2mins for whole set of explored 15 k

Actual Results

Computation takes more than 20min with exactly same data and setup as before Also, computation even with k=1 takes very long time → compared to previous version lower k meant much faster computation

Versions

System: python: 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] executable: C:\Users\micha\anaconda3\python.exe machine: Windows-10-10.0.18362-SP0

Python dependencies: pip: 20.0.2 setuptools: 45.2.0.post20200210 sklearn: 0.23.0 numpy: 1.18.1 scipy: 1.4.1 Cython: 0.29.15 pandas: 1.0.3 matplotlib: 3.1.3 joblib: 0.14.1

Built with OpenMP: True

Issue Analytics

State:
Created 3 years ago
Comments:34 (21 by maintainers)

Top GitHub Comments

1reaction

ogriselcommented, May 25, 2020

I updated all used packages today and newly (0.23.1) seems solved it and actually it is faster than (0.22.2). Thanks!!

Ok so part of the issue was fixed in #17235. The remainder will be tackled in #17334. Let’s close this.

1reaction

MichalRIcarcommented, May 25, 2020

Hello,

I updated all used packages today and newly (0.23.1) seems solved it and actually it is faster than (0.22.2). Thanks!!

Top Results From Across the Web

sklearn.cluster.KMeans — scikit-learn 1.2.0 documentation

Maximum number of iterations of the k-means algorithm for a single run. tolfloat, default=1e-4. Relative tolerance with regards to Frobenius norm of the ......

Implementing a faster KMeans in scikit-learn 0.23

Then, the center of the cluster is updated to be the barycenter of its assigned data points. A benchmark comparison with daal4py, the...

How to Speed up Your K-Means Clustering by up to 10x Over ...

But that's where we run into a problem: K-Means is slow when it comes to bigger datasets as there are just so many...

How to speed-up k-means from Scikit learn? - Stack Overflow

I need to boost it. I have tried to change the number of n_jobs to -1 , but still very slow! Any suggestions...

How to Cluster Documents Using Word2Vec and K-means

The notebooks in clustering/ and preprocessing/ include additional code snippets that might be useful for NLP tasks. You can review those on ...