Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

KMeans singnificantly slower on 0.23

See original GitHub issue

Describe the bug

With the latest changes, KMeans is significantly slower on small datasets. The time needed to compute clusters is around ten times longer.

Steps/Code to Reproduce

Times with the following code are: scikit-lern 0.22: ~0.015 scikit-learn 0.23: ~0.15

import time

import sklearn.cluster
from sklearn import datasets

data = datasets.load_iris()['data']

t = time.time()
sklearn.cluster.KMeans(n_clusters=2).fit(data)
print(time.time() - t)

I also tried on a bigger dataset with shape (300, 25) where clustering with the new version needed 3-4s while before it happened in miliseconds.

Expected Results

Clusters would be computed as fast as before.

Versions

System:
    python: 3.7.6 | packaged by conda-forge | (default, Jan  7 2020, 22:05:27)  [Clang 9.0.1 ]
executable: /Users/primoz/miniconda3/envs/orange/bin/python
   machine: Darwin-19.0.0-x86_64-i386-64bit
Python dependencies:
       pip: 20.1
setuptools: 46.1.3
   sklearn: 0.23.0
     numpy: 1.18.4
     scipy: 1.4.1
    Cython: None
    pandas: 1.0.3
matplotlib: 3.2.1
    joblib: 0.14.1
Built with OpenMP: True

Issue Analytics

State:
Created 3 years ago
Comments:15 (11 by maintainers)

Top GitHub Comments

1reaction

PrimozGodeccommented, May 18, 2020

@jeremiedbb thank you for your help. I tested the PR and it works now normally.

0reactions

jeremiedbbcommented, May 18, 2020

Thanks @PrimozGodec ! Closing.

Top Results From Across the Web

Implementing a faster KMeans in scikit-learn 0.23

There is no temporary array but it's slower, because distance computation cannot be vectorized. Besides causing memory issues, a large temporary array does...

How to speed-up k-means from Scikit learn? - Stack Overflow

I need to boost it. I have tried to change the number of n_jobs to -1 , but still very slow!

How Slow is the k-Means Method? - Stanford CS Theory

In this paper, we demonstrate that the worst-case running time of k-means is superpolyno- mial by improving the best known lower bound from...

What to Do When K-Means Clustering Fails - NCBI - NIH

This is because K-means is nested: we can always decrease E by increasing K, even when the true number of clusters is much...

sklearn.cluster.KMeans — scikit-learn 1.2.0 documentation

For large scale learning (say n_samples > 10k) MiniBatchKMeans is probably much faster than the default batch implementation. Notes. The k-means problem is ......