question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Possible Feature Request(?): AUC metric and optimism via bootstrapping

See original GitHub issue

Describe the workflow you want to enable

In the computation of AUC, with Confidence intervals, one may either use cross-validation hold-out data (e.g. any of the <X>-Fold functions in sklearn).

However, this procedure can be a bit problematic especially in low-sample data settings. In these low-sample data settings, it is sometimes suggested to instead use bootstrapping to estimate the “optimism” of the model, which can then be corrected for in the roc_auc_score.

Describe your proposed solution

Under the inspection, or utils module perhaps, a compute_metric_optimism() function could be used perhaps? This can be extended to metrics beyond just AUC.

Some pseudo-code:

def compute_metric_optimism(estimator: BaseEstimator, n_bootstrap: int, scoring: <scoring_func>, n_jobs=None):
    # compute the original metric on original data  
    orig_score = scoring()

    for boot_idx in range(n_bootstrap):
        # fit model
        estimator.fit()

        # compute score  on sample
        score = scoring()

    # compute optimism
    optimism = <average over the bootstrap samples>

    # compute optimism adjusted metric
    return orig_score - optimism

Describe alternatives you’ve considered, if relevant

If maintainers feel it is not necessary, then it is always possible to do it outside of sklearn, either w/ 3rd party packages, or self-implemented.

Additional context

For some reference on the procedure in R: https://thestatsgeek.com/2014/10/04/adjusting-for-optimismoverfitting-in-measures-of-predictive-ability-using-bootstrapping/

Happy to try a PR if this goes off.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
adrinjalalicommented, Jul 10, 2020

Considering cross_validate doesn’t require the splits to be disjoint sets, and that it returns the test scores as well as the train scores, a very short example in the example gallery would be welcome.

0reactions
adrinjalalicommented, Oct 20, 2021

Closing as WONTFIX, please refer to the comments on the PR

Read more comments on GitHub >

github_iconTop Results From Across the Web

Part 4: Why does bias occur in optimism corrected ...
As p (features) >> N (samples) we are going to get better and better ability to get good model performance using the bootstrapped...
Read more >
Part 6: How not to validate your model with optimism corrected ...
This is the optimism corrected bootstrapping method: ... cc <- c() ... calculate optimism corrected measure of prediction (AUC).
Read more >
Statistical evidence that the AUC was not overfitted to the ...
The difference is a measure of the optimism of the model that was developed on the bootstrap sample. The idea is that bootstrap...
Read more >
Adjusting for optimism/overfitting in measures of predictive ...
using this fitted model and this bootstrap dataset. ... Calculate the optimism adjusted measure of predictive ability as C_{app} - O.
Read more >
Part 2: Optimism corrected bootstrapping is definitely bias ...
Glmnet is basically logistic regression with feature selection, ... calculate optimism corrected measure of prediction (AUC) corrected ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found