Calibration and Refinement loss for Brier score loss
See original GitHub issueDescribe the workflow you want to enable
As per Brier score User Guide:
“Only when refinement loss remains the same does a lower Brier score loss always mean better calibration”
But the current API doesn’t provide refinement loss/calibration loss. Which makes it hard to measure the quality of probabilistic estimates.
Describe your proposed solution
My proposed solution involves implementing the one described in the paper [Flach2008] , the reference which is conveniently already mentioned in the User Guide
Namely, estimating Calibration loss and Refinement loss without any binning, on raw data/predictions.
If the community decides this is a valuable addition, I humbly present my implementation, which is WIP in terms of Scikit-Learn codebase conventions compliance and corner case processing, but is functioning in essence.
Describe alternatives you’ve considered, if relevant
Visual comparison of calibration curves <not exact, not scalable
Making custom ad hoc metrics to estimate probability errors <fragile, dubious
Using sklearn.calibration.calibration_curve
<requires binning1
Using #11096 <requires binning1
1 Now, problem with binning is described in [Bella2012]. To quote:
The problem of using bins is that if too few bins are defined, the real probabilities are not properly detailed to give an accurate evaluation. If too many bins are defined, the real probabilities are not properly estimated. A partial solution to this problem is to make the bins overlap.
And using overlapping bins seems like and additional degree of freedom, additional parameter one’d have to keep in mind, tune and argue about with collegues.
Additional context
I’m aware that contributors to #11096 have done work in implementing Calibration loss with binning and clarifying docs on calibration topic in general. So I’d like to get feedback on that and I’m open to suggestions how to proceed.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:5 (4 by maintainers)
Top GitHub Comments
@ColdTeapot273K @lorentzenchr if you already have a good understanding of both approaches, I would love you to share your analysis of the pros and cons of each.
See https://github.com/scikit-learn/scikit-learn/issues/21774#issuecomment-1128635219, arxiv link is https://arxiv.org/abs/2008.03033.
This issue will be solved if we solve the broader (but not much more difficult) #23767.