sklearn.cluster.KMeans 0.23 is extra slower compared to 0.22.2
See original GitHub issueUsed code:
from sklearn import cluster
for k in range(1,15):
cluster.KMeans(
n_clusters = k,
random_state = 42,
n_init = 10,
max_iter = 2000,
algorithm = 'full',
init = 'k-means++' )
Expected Results
Computation in v0.22.2 was done in 2mins for whole set of explored 15 k
Actual Results
Computation takes more than 20min with exactly same data and setup as before Also, computation even with k=1 takes very long time → compared to previous version lower k meant much faster computation
Versions
System: python: 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] executable: C:\Users\micha\anaconda3\python.exe machine: Windows-10-10.0.18362-SP0
Python dependencies: pip: 20.0.2 setuptools: 45.2.0.post20200210 sklearn: 0.23.0 numpy: 1.18.1 scipy: 1.4.1 Cython: 0.29.15 pandas: 1.0.3 matplotlib: 3.1.3 joblib: 0.14.1
Built with OpenMP: True
Issue Analytics
- State:
- Created 3 years ago
- Comments:34 (21 by maintainers)
Top Results From Across the Web
sklearn.cluster.KMeans — scikit-learn 1.2.0 documentation
Maximum number of iterations of the k-means algorithm for a single run. tolfloat, default=1e-4. Relative tolerance with regards to Frobenius norm of the ......
Read more >Implementing a faster KMeans in scikit-learn 0.23
Then, the center of the cluster is updated to be the barycenter of its assigned data points. A benchmark comparison with daal4py, the...
Read more >How to Speed up Your K-Means Clustering by up to 10x Over ...
But that's where we run into a problem: K-Means is slow when it comes to bigger datasets as there are just so many...
Read more >How to speed-up k-means from Scikit learn? - Stack Overflow
I need to boost it. I have tried to change the number of n_jobs to -1 , but still very slow! Any suggestions...
Read more >How to Cluster Documents Using Word2Vec and K-means
The notebooks in clustering/ and preprocessing/ include additional code snippets that might be useful for NLP tasks. You can review those on ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Ok so part of the issue was fixed in #17235. The remainder will be tackled in #17334. Let’s close this.
Hello,
I updated all used packages today and newly (0.23.1) seems solved it and actually it is faster than (0.22.2). Thanks!!