question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Enhancement] Add support for sample_weight in the fit function

See original GitHub issue

The scikit-learn KMeans algorithm allows support for supplying a weight for each sample in the fit function. See the docs here.

Is this possible to add into the algorithm? i.e. can we have the minimum and maximum bounds account for the sum of all weights instead of the count of all samples? I haven’t read into the MinCostFlow algorithm so I don’t know how feasible this would be.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

4reactions
joshlkcommented, May 2, 2022

I’ve had a rethink…

I’ve had a look at how scikit-learn defines sample_weights:

The algorithm supports sample weights, which can be given by a parameter sample_weight. This allows to assign more weight to some samples when computing cluster centers and values of inertia. For example, assigning a weight of 2 to a sample is equivalent to adding a duplicate of that sample to the dataset .

Which I think its different to what I said:

I think it would be equivalent to weighting the distances.

and how you described it:

can we have the minimum and maximum bounds account for the sum of all weights instead of the count of all samples?

All of the above is possible - it’s just about figuring out what to weight. Feel free to have a shot at it. I will also have a longer think about what is needed

1reaction
hectoradrian961030commented, Jul 3, 2022

@joshlk I think I have a similar need. In the problem I’m trying to solve, size_max is the sum of the weights of a cluster instead the size of a cluster. A point of X is the centroid of a polygon and the weight of that polygon (point of X) is the sum of its vertices. Do you think the algorithm can be easily modified to handle this?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Sample Weight Support for Regression Problems [ENH] #37
Currently it seems like Boruta-Shap does not support this (unless I'm missing something). Enhancement. Add support for sample_weights.
Read more >
Using sample_weight in Keras for sequence labelling
and pass that to the fit function through the sample_weight parameter after having added the sample_weight_mode="temporal" option in compile() .
Read more >
Customizing what happens in `fit()` - Keras
This is the function that is called by fit() for every batch of data. ... If you want to support the fit() arguments...
Read more >
Version 0.16.1 — scikit-learn 1.2.0 documentation
Add support for sample weights in scorer objects. Metrics with sample weight support will automatically benefit from it. By Noel Dawe and Vlad...
Read more >
Python API Reference — xgboost 1.7.2 documentation
See Global Configuration for the full list of parameters supported in the global ... When eval_metric is also passed to the fit() function,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found