TST Check correct interactions of `class_weight` and `sample_weight`

In scikit-learn, some estimators support class_weight and sample_weight.

It might be worth testing the correct interaction of those two types of weights, especially asserting that:

setting a class weights to zero is equivalent to excluding the samples associated to this class from the calibration even when using non uniform sample weights.
setting some samples weights to zero is equivalent to excluding those samples from the calibration even when if they are associated to using non uniform class weights.

Relevant interfaces:

the main subclasses of sklearn.tree.BaseDecisionTree for classification, i.e.:
- sklearn.tree.DecisionTreeClassifier
- sklearn.tree.ExtraTreeClassifier
the main subclasses of sklearn.ensemble.BaseForest for classification and embedding, i.e.:
- sklearn.ensemble.RandomTreesEmbedding
- sklearn.ensemble.RandomForestClassifier
- sklearn.ensemble.ExtraTreesClassifier
sklearn.linear_model.LogisticRegression
sklearn.linear_model.LogisticRegressionCV
sklearn.CalibratedClassifierCV after the merge of #17541

1reaction

mlantcommented, Nov 4, 2021

Thank you, I think it’s clear. I’ll begin tomorow, if I have some issues I’ll come back to you.

1reaction

rthcommented, Nov 4, 2021

I think we could probably directly add this as a common tests (and maybe skip estimators that fail at first)? It’s very similar in spirit to https://github.com/scikit-learn/scikit-learn/blob/46485a93beccd79d0e3563512e8385b3e5667524/sklearn/utils/estimator_checks.py#L972 so I imagine we could add a similar def check_class_weights_invariance(name, estimator_orig) and only run it here if the estimator has the “class_weight” init parameter.

Thanks for raising this @jjerphan !