Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Function to get scorers for task

See original GitHub issue

I would like to see a utility which would construct a set of applicable scorers for a particular task, returning a Mapping from string to callable scorer. It will be hard to design the API of this right the first time. [Maybe this should be initially developed outside this project and contributed to scikit-learn-contrib, but I think it reduces risk of mis-specifying scorers, so it’s of benefit to this project.]

The user will be able to select a subset of the scorers, either with a dict comprehension or with some specialised methods or function parameters. Initially it wouldn’t be efficient to run all these scorers, but hopefully we can do something to fix #10802 😐.

Let’s take for instance a binary classification task. The function get_applicable_scorers(y, pos_label='yes') for binary y might produce something like:

{
    'accuracy': make_scorer(accuracy_score),
    'balanced_accuracy': make_scorer(balanced_accuracy_score),
    'matthews_corrcoef': make_scorer(matthews_corrcoef),
    'cohens_kappa': make_scorer(cohens_kappa),
    'precision': make_scorer(precision_score, pos_label='yes'),
    'recall': make_scorer(recall_score, pos_label='yes'),
    'f1': make_scorer(f1_score, pos_label='yes'),
    'f0.5': make_scorer(f1_score, pos_label='yes', beta=0.5),
    'f2': make_scorer(f1_score, pos_label='yes', beta=2),
    'specificity': ...,
    'miss_rate': ...,
    ...
    'roc_auc': make_scorer(roc_auc_score, needs_threshold=True),
    'average_precision': make_scorer(average_precision_score, needs_threshold=True),
    'neg_log_loss': make_scorer(average_precision_score, needs_proba=True, greater_is_better=False),
    'neg_brier_score_loss': make_scorer(average_precision_score, needs_proba=True, greater_is_better=False),
}

Doing the same for multiclass classification would pass labels as appropriate, and would optionally would get per-class binary metrics, as well as overall multiclass metrics.

I’m not sure how sample_weight fits in here, but ha! we still don’t support weighted scoring in cross validation (#1574), so let’s not worry about that.