Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unused parameters in dask_ml.cluster.KMeans

See original GitHub issue

The precompute_distances, copy_x, and n_jobs parameters for the KMeans class do not seem to be used beyond initialization.

Are there plans for use of these parameters?

Issue Analytics

State:
Created 5 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

TomAugspurgercommented, Aug 1, 2018

These are included solely to match the scikit-learn API

copy_x dosen’t make sense because dask arrays are immutable.

n_jobs doesn’t really make sense. In scikit-learn, that controls the parallelism of the n_init different random initilizations. We only do 1 initialization.

I’m not sure about precompute_distances. I suspect it makes less sense for Dask-ML, since the distances would have to move between workers, but I don’t recall the scikit-learn implementation well.

0reactions

TomAugspurgercommented, Apr 30, 2019

I think my general preference is to match the signature, but raise an exception when the value doesn’t match the default. That makes it a bit easier to do prototyping, since you can more easily get_params() and set_params() between a scikit-learn model and a dask-ml model.

Top Results From Across the Web

dask_ml.cluster.KMeans — dask-ml 2022.5.28 documentation

This class implements a parallel and distributed version of k-Means. The default initializer for KMeans is k-means|| , compared to k-means++ from scikit-learn. ......

dask-ml/k_means.py at main - cluster - GitHub

Sum of distances of samples to their closest cluster center. This class implements a parallel and distributed version of k-Means. *Scalable K-Means++ (2012)*. ......

Scale Machine Learning Code with Dask | Dask Summit 2021

Speakers - Andrew Mshar, Ryan SoleyDo you use the Scikit-learn library to build machine learning models? In this tutorial, we'll discuss how ...

scikit-learn user guide

KMeans where the sample weights provided by the user were modified in ... pooling_func unused parameter in cluster. AgglomerativeClustering.

LightGBM - Read the Docs

The HDFS version of LightGBM was tested on CDH-5.14.4 cluster. ... be over-fitting if not used with the appropriate parameters.