question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unused parameters in dask_ml.cluster.KMeans

See original GitHub issue

The precompute_distances, copy_x, and n_jobs parameters for the KMeans class do not seem to be used beyond initialization.

Are there plans for use of these parameters?

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
TomAugspurgercommented, Aug 1, 2018

These are included solely to match the scikit-learn API

copy_x dosen’t make sense because dask arrays are immutable.

n_jobs doesn’t really make sense. In scikit-learn, that controls the parallelism of the n_init different random initilizations. We only do 1 initialization.

I’m not sure about precompute_distances. I suspect it makes less sense for Dask-ML, since the distances would have to move between workers, but I don’t recall the scikit-learn implementation well.

0reactions
TomAugspurgercommented, Apr 30, 2019

I think my general preference is to match the signature, but raise an exception when the value doesn’t match the default. That makes it a bit easier to do prototyping, since you can more easily get_params() and set_params() between a scikit-learn model and a dask-ml model.

Read more comments on GitHub >

github_iconTop Results From Across the Web

dask_ml.cluster.KMeans — dask-ml 2022.5.28 documentation
This class implements a parallel and distributed version of k-Means. The default initializer for KMeans is k-means|| , compared to k-means++ from scikit-learn. ......
Read more >
dask-ml/k_means.py at main - cluster - GitHub
Sum of distances of samples to their closest cluster center. This class implements a parallel and distributed version of k-Means. *Scalable K-Means++ (2012)*. ......
Read more >
Scale Machine Learning Code with Dask | Dask Summit 2021
Speakers - Andrew Mshar, Ryan SoleyDo you use the Scikit-learn library to build machine learning models? In this tutorial, we'll discuss how ...
Read more >
scikit-learn user guide
KMeans where the sample weights provided by the user were modified in ... pooling_func unused parameter in cluster. AgglomerativeClustering.
Read more >
LightGBM - Read the Docs
The HDFS version of LightGBM was tested on CDH-5.14.4 cluster. ... be over-fitting if not used with the appropriate parameters.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found