Change KMeans n_init default value to 1.
See original GitHub issueAs mentioned in #9430, pydaal gets a speedup of 30x, mostly because they ignore n_init.
I feel like n_init=10
is a pretty odd choice. We don’t really do random restarts anywhere else by default (afaik), and in the age of large datasets this is really pretty expensive.
Not sure if it’s worth a deprecation cycle but this is pretty non-obvious behavior that potentially makes us look bad.
Issue Analytics
- State:
- Created 6 years ago
- Comments:17 (16 by maintainers)
Top Results From Across the Web
sklearn.cluster.KMeans — scikit-learn 1.2.0 documentation
Changed in version 1.4: Default value for n_init will change from 10 to 'auto' in version 1.4. max_iterint, default=300. Maximum number of iterations...
Read more >initial centroids for scikit-learn kmeans clustering
Yes, setting initial centroids via init should work. Here's a quote from scikit-learn documentation: init : {'k-means++', 'random' or an ...
Read more >KMeans Hyper-parameters Explained with Examples
There are other methods for unsupervised clustering, such as DBScan, ... n_init = By default is 10 and so the algorithm will initialize...
Read more >K-Means Optimization & Parameters - HolyPython.com
n_init : (default: 10) Another significant parameter n_init is used to define the number of initialization attempts for centroids of clusters.
Read more >In Depth: k-Means Clustering | Python Data Science Handbook
The k-means algorithm searches for a pre-determined number of clusters within an ... as indeed Scikit-Learn does by default (set by the n_init...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
In the example: Empirical evaluation of the impact of k-means initialization, it does show that
n_init > 1
leads to an improvement forinit="random"
. For k-means++,n_init
does not make a difference. If we go by the example, then we can have an_init="auto"
, where:n_init=1
for kmeans++ andn_init=10
for random.Yes, let’s change it. Or default to have an inverse relationship with dataset size.
On 12 September 2017 at 00:55, Andreas Mueller notifications@github.com wrote: