TST Check correct interactions of `class_weight` and `sample_weight`
See original GitHub issueIn scikit-learn, some estimators support class_weight
and sample_weight
.
It might be worth testing the correct interaction of those two types of weights, especially asserting that:
- setting a class weights to zero is equivalent to excluding the samples associated to this class from the calibration even when using non uniform sample weights.
- setting some samples weights to zero is equivalent to excluding those samples from the calibration even when if they are associated to using non uniform class weights.
Relevant interfaces:
- the main subclasses of
sklearn.tree.BaseDecisionTree
for classification, i.e.:-
sklearn.tree.DecisionTreeClassifier
-
sklearn.tree.ExtraTreeClassifier
-
- the main subclasses of
sklearn.ensemble.BaseForest
for classification and embedding, i.e.:-
sklearn.ensemble.RandomTreesEmbedding
-
sklearn.ensemble.RandomForestClassifier
-
sklearn.ensemble.ExtraTreesClassifier
-
-
sklearn.linear_model.LogisticRegression
-
sklearn.linear_model.LogisticRegressionCV
-
sklearn.CalibratedClassifierCV
after the merge of #17541
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (9 by maintainers)
Top Results From Across the Web
How can correct sample_weight in sklearn.naive_bayes?
The sample_weight and class_weight are two different things. As their name suggests: sample_weight is to be applied to individual samples ...
Read more >python - difference between sample_weight and class_weight ...
So: The sample weights exist to change the importance of data-points whereas the class weights change the weights to correct class imbalance.
Read more >How To Dealing With Imbalanced Classes in Machine Learning
Learn how to deal with imbalanced classes in machine learning by improving the class imbalance using Python and improve your model.
Read more >Why Weight? The Importance of Training on Balanced Datasets
Calculate sample weights. Balanced class weights can be automatically calculated within the sample weight function. Set class_weight = 'balanced ...
Read more >Beating Naive Bayes at Taxonomic Classification of 16S rRNA ...
Class weight information can be utilized by a variety of supervised ... To test whether reducing the classification confidence threshold ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thank you, I think it’s clear. I’ll begin tomorow, if I have some issues I’ll come back to you.
I think we could probably directly add this as a common tests (and maybe skip estimators that fail at first)? It’s very similar in spirit to https://github.com/scikit-learn/scikit-learn/blob/46485a93beccd79d0e3563512e8385b3e5667524/sklearn/utils/estimator_checks.py#L972 so I imagine we could add a similar
def check_class_weights_invariance(name, estimator_orig)
and only run it here if the estimator has the “class_weight” init parameter.Thanks for raising this @jjerphan !