question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Calibration and Refinement loss for Brier score loss

See original GitHub issue

Describe the workflow you want to enable

As per Brier score User Guide:

“Only when refinement loss remains the same does a lower Brier score loss always mean better calibration”

But the current API doesn’t provide refinement loss/calibration loss. Which makes it hard to measure the quality of probabilistic estimates.

Describe your proposed solution

My proposed solution involves implementing the one described in the paper [Flach2008] , the reference which is conveniently already mentioned in the User Guide

Namely, estimating Calibration loss and Refinement loss without any binning, on raw data/predictions.

If the community decides this is a valuable addition, I humbly present my implementation, which is WIP in terms of Scikit-Learn codebase conventions compliance and corner case processing, but is functioning in essence.

Describe alternatives you’ve considered, if relevant

Visual comparison of calibration curves <not exact, not scalable

Making custom ad hoc metrics to estimate probability errors <fragile, dubious

Using sklearn.calibration.calibration_curve <requires binning1

Using #11096 <requires binning1

1 Now, problem with binning is described in [Bella2012]. To quote:

The problem of using bins is that if too few bins are defined, the real probabilities are not properly detailed to give an accurate evaluation. If too many bins are defined, the real probabilities are not properly estimated. A partial solution to this problem is to make the bins overlap.

And using overlapping bins seems like and additional degree of freedom, additional parameter one’d have to keep in mind, tune and argue about with collegues.

Additional context

I’m aware that contributors to #11096 have done work in implementing Calibration loss with binning and clarifying docs on calibration topic in general. So I’d like to get feedback on that and I’m open to suggestions how to proceed.

UPD: Closes #18268, #21718

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:2
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
ogriselcommented, May 17, 2022

@ColdTeapot273K @lorentzenchr if you already have a good understanding of both approaches, I would love you to share your analysis of the pros and cons of each.

0reactions
lorentzenchrcommented, Jun 27, 2022

could you clarify what do you refer to as “CORP approach”

See https://github.com/scikit-learn/scikit-learn/issues/21774#issuecomment-1128635219, arxiv link is https://arxiv.org/abs/2008.03033.

This issue will be solved if we solve the broader (but not much more difficult) #23767.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Understanding Brier Loss Composition - Cross Validated
According to Wikipedia, "The second term is known as refinement. It is an aggregation of resolution and uncertainty and is related to the...
Read more >
sklearn.metrics.brier_score_loss
It can be decomposed as the sum of refinement loss and calibration loss. The Brier score is appropriate for binary and categorical outcomes...
Read more >
Separating the Brier Score into Calibration and ... - jstor
This article presents a graphical description of this separation theorem as applied to the Brier score (quadratic loss) of assessed probabilities for a...
Read more >
Separating the Brier Score into Calibration and Refinement ...
This article presents a graphical description of this separation theorem as applied to the Brier score (quadratic loss) of assessed ...
Read more >
Brier score - Wikipedia
Therefore, the lower the Brier score is for a set of predictions, the better the predictions are calibrated. Note that the Brier score,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found