question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

KMeans singnificantly slower on 0.23

See original GitHub issue

Describe the bug

With the latest changes, KMeans is significantly slower on small datasets. The time needed to compute clusters is around ten times longer.

Steps/Code to Reproduce

Times with the following code are: scikit-lern 0.22: ~0.015 scikit-learn 0.23: ~0.15

import time

import sklearn.cluster
from sklearn import datasets

data = datasets.load_iris()['data']

t = time.time()
sklearn.cluster.KMeans(n_clusters=2).fit(data)
print(time.time() - t)

I also tried on a bigger dataset with shape (300, 25) where clustering with the new version needed 3-4s while before it happened in miliseconds.

Expected Results

Clusters would be computed as fast as before.

Versions

System:
    python: 3.7.6 | packaged by conda-forge | (default, Jan  7 2020, 22:05:27)  [Clang 9.0.1 ]
executable: /Users/primoz/miniconda3/envs/orange/bin/python
   machine: Darwin-19.0.0-x86_64-i386-64bit
Python dependencies:
       pip: 20.1
setuptools: 46.1.3
   sklearn: 0.23.0
     numpy: 1.18.4
     scipy: 1.4.1
    Cython: None
    pandas: 1.0.3
matplotlib: 3.2.1
    joblib: 0.14.1
Built with OpenMP: True

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:15 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
PrimozGodeccommented, May 18, 2020

@jeremiedbb thank you for your help. I tested the PR and it works now normally.

0reactions
jeremiedbbcommented, May 18, 2020

Thanks @PrimozGodec ! Closing.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Implementing a faster KMeans in scikit-learn 0.23
There is no temporary array but it's slower, because distance computation cannot be vectorized. Besides causing memory issues, a large temporary array does...
Read more >
How to speed-up k-means from Scikit learn? - Stack Overflow
I need to boost it. I have tried to change the number of n_jobs to -1 , but still very slow!
Read more >
How Slow is the k-Means Method? - Stanford CS Theory
In this paper, we demonstrate that the worst-case running time of k-means is superpolyno- mial by improving the best known lower bound from...
Read more >
What to Do When K-Means Clustering Fails - NCBI - NIH
This is because K-means is nested: we can always decrease E by increasing K, even when the true number of clusters is much...
Read more >
sklearn.cluster.KMeans — scikit-learn 1.2.0 documentation
For large scale learning (say n_samples > 10k) MiniBatchKMeans is probably much faster than the default batch implementation. Notes. The k-means problem is ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found