Deprecate normalize parameter in `calibration_curve`
See original GitHub issueDescribe the workflow you want to enable
Similar to the behavior of calibration_curve
, I would like to be able to set CalibrationDisplay.from_predictions(normalize=True)
.
Describe your proposed solution
Add a keyword argument normalize
to CalibrationDisplay.from_predictions
and pass it to calibration_curve
.
Describe alternatives you’ve considered, if relevant
I could manually normalize the values, but given that the functionality is already present in the underlying calibration_curve
I would say this is a relatively easy thing to adjust on the scikit-learn side.
Additional context
No response
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (6 by maintainers)
Top Results From Across the Web
sklearn.calibration.calibration_curve
Compute true and predicted probabilities for a calibration curve. ... Deprecated since version 1.1: The normalize argument is deprecated in v1.1 and will...
Read more >Normalization Formula: How To Use It on a Data Set
The normalization formula is a statistics formula that can transform a data set so that all of its variations fall between zero and...
Read more >How to Normalize and Standardize Time Series Data in ...
Standardization assumes that your observations fit a Gaussian distribution (bell curve) with a well behaved mean and standard deviation.
Read more >Linear Regression :: Normalization (Vs) Standardization
Normalization rescales the values into a range of [0,1]. also called min-max scaled. Standardization rescales data to have a mean (μ) of 0...
Read more >Selecting between-sample RNA-Seq normalization methods ...
An underemphasized feature of normalization is the assumptions on which the methods rely and how the validity of these assumptions can have a...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
OK, I went into the discussion in the PR and discussed it as well with @ogrisel IRL.
Methodologically, a calibration curve should only be used with a probability. Using
calibration_curve
withnormalize=True
is equivalent to a naive linear calibration with additional clipping for value above/under max/min.So I think that we don’t want people using this type of naive calibration without doing it explicitly.
I think that in your case it boils down to how you want to relate the risk scores to a probability. If you want a linear mapping then
normalize=True
was the right move. However, you could as well use a sigmoid transform centred on score 5 to express a probability. Since I don’t have the background on what are the scores, this is difficult to say what is the right choice to transform the score into probability but it shows that using the naive normalization implicitly could be dangerous in general.I would be in favour of deprecating
normalize
as well with an explicit warning that encourages users to provide probabilities that could be obtained through a calibrated classifier if the output is a decision function.@glemaitre we had some comments about this: https://github.com/scikit-learn/scikit-learn/pull/17443/files#r670597315
Essentially, we thought that
from_estimator
should only accept estimators with apredict_proba
method, and we are surepredict_proba
will output a true probability. We had also decided that we did not want to (/not a good idea to) pass output ofdecision_function
tocalibration_curve
, which probably boils down to @NicolasHug’s comment here: https://github.com/scikit-learn/scikit-learn/pull/17443#discussion_r440103386, though that whole thread is probably relevant.