Unused parameters in dask_ml.cluster.KMeans
See original GitHub issueThe precompute_distances
, copy_x
, and n_jobs
parameters for the KMeans
class do not seem to be used beyond initialization.
Are there plans for use of these parameters?
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (5 by maintainers)
Top Results From Across the Web
dask_ml.cluster.KMeans — dask-ml 2022.5.28 documentation
This class implements a parallel and distributed version of k-Means. The default initializer for KMeans is k-means|| , compared to k-means++ from scikit-learn. ......
Read more >dask-ml/k_means.py at main - cluster - GitHub
Sum of distances of samples to their closest cluster center. This class implements a parallel and distributed version of k-Means. *Scalable K-Means++ (2012)*. ......
Read more >Scale Machine Learning Code with Dask | Dask Summit 2021
Speakers - Andrew Mshar, Ryan SoleyDo you use the Scikit-learn library to build machine learning models? In this tutorial, we'll discuss how ...
Read more >scikit-learn user guide
KMeans where the sample weights provided by the user were modified in ... pooling_func unused parameter in cluster. AgglomerativeClustering.
Read more >LightGBM - Read the Docs
The HDFS version of LightGBM was tested on CDH-5.14.4 cluster. ... be over-fitting if not used with the appropriate parameters.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
These are included solely to match the scikit-learn API
copy_x dosen’t make sense because dask arrays are immutable.
n_jobs doesn’t really make sense. In scikit-learn, that controls the parallelism of the
n_init
different random initilizations. We only do 1 initialization.I’m not sure about precompute_distances. I suspect it makes less sense for Dask-ML, since the distances would have to move between workers, but I don’t recall the scikit-learn implementation well.
I think my general preference is to match the signature, but raise an exception when the value doesn’t match the default. That makes it a bit easier to do prototyping, since you can more easily
get_params()
andset_params()
between a scikit-learn model and a dask-ml model.