Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Calibration and Refinement loss for Brier score loss

See original GitHub issue

Describe the workflow you want to enable

As per Brier score User Guide:

“Only when refinement loss remains the same does a lower Brier score loss always mean better calibration”

But the current API doesn’t provide refinement loss/calibration loss. Which makes it hard to measure the quality of probabilistic estimates.

Describe your proposed solution

My proposed solution involves implementing the one described in the paper [Flach2008] , the reference which is conveniently already mentioned in the User Guide

Namely, estimating Calibration loss and Refinement loss without any binning, on raw data/predictions.

If the community decides this is a valuable addition, I humbly present my implementation, which is WIP in terms of Scikit-Learn codebase conventions compliance and corner case processing, but is functioning in essence.

Describe alternatives you’ve considered, if relevant

Visual comparison of calibration curves <not exact, not scalable

Making custom ad hoc metrics to estimate probability errors <fragile, dubious

Using sklearn.calibration.calibration_curve <requires binning¹

Using #11096 <requires binning¹

¹ Now, problem with binning is described in [Bella2012]. To quote:

The problem of using bins is that if too few bins are defined, the real probabilities are not properly detailed to give an accurate evaluation. If too many bins are defined, the real probabilities are not properly estimated. A partial solution to this problem is to make the bins overlap.

And using overlapping bins seems like and additional degree of freedom, additional parameter one’d have to keep in mind, tune and argue about with collegues.

Additional context

I’m aware that contributors to #11096 have done work in implementing Calibration loss with binning and clarifying docs on calibration topic in general. So I’d like to get feedback on that and I’m open to suggestions how to proceed.

UPD: Closes #18268, #21718

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

ogriselcommented, May 17, 2022

@ColdTeapot273K @lorentzenchr if you already have a good understanding of both approaches, I would love you to share your analysis of the pros and cons of each.

0reactions

lorentzenchrcommented, Jun 27, 2022

could you clarify what do you refer to as “CORP approach”

See https://github.com/scikit-learn/scikit-learn/issues/21774#issuecomment-1128635219, arxiv link is https://arxiv.org/abs/2008.03033.

This issue will be solved if we solve the broader (but not much more difficult) #23767.

Top Results From Across the Web

Understanding Brier Loss Composition - Cross Validated

According to Wikipedia, "The second term is known as refinement. It is an aggregation of resolution and uncertainty and is related to the...

sklearn.metrics.brier_score_loss

It can be decomposed as the sum of refinement loss and calibration loss. The Brier score is appropriate for binary and categorical outcomes...

Separating the Brier Score into Calibration and ... - jstor

This article presents a graphical description of this separation theorem as applied to the Brier score (quadratic loss) of assessed probabilities for a...

Separating the Brier Score into Calibration and Refinement ...

This article presents a graphical description of this separation theorem as applied to the Brier score (quadratic loss) of assessed ...

Brier score - Wikipedia

Therefore, the lower the Brier score is for a set of predictions, the better the predictions are calibrated. Note that the Brier score,...