Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Integrate `scikit-learn` metrics into `evaluate`

See original GitHub issue

Summary

We want to support the wide range of metrics implemented in scikit-learn in evaluate. While this expands the capabilities of evaluate it also gives users from the scikit-learn ecosystem access to useful tools in evaluate such as pushing results to the hub or end-to-end evaluate models with the evaluator classes. As a bonus, all metrics will get an interactive widget that can be embedded in various places such as the docs.

Goal

The goal of this integration should be that metrics from scikit-learn can be loaded from evaluate with the following API:

import evaluate

metric = evaluate.load("sklearn/accuracy")
metric.compute(predictions=[0, 1, 1], references=[1, 1, 0])

How it can be done

For the integration we could build a script that goes through all metrics of the scikit-learn repository and automatically builds the metric repositories in the evaluate format and pushes them to the Hub. This could be a script that’s executed via a GitHub action whenever a change is pushed to main similar to how it’s done for the internal modules (see here).

Besides the function, its arguments and input/output format we can also use the docs to populate the gradio widget on the hub. See the Accuracy module as an example of how the metrics could be displayed.

Issue Analytics

State:
Created a year ago
Comments:7 (1 by maintainers)

Top GitHub Comments

1reaction

Mouhanedg56commented, Oct 11, 2022

Okey, sounds good. I was trying listing scikit-learn metrics using inspect module and it works very well:

from inspect import getmembers, isfunction

from sklearn import metrics
print([func for func in getmembers(metrics, isfunction) if func[0].endswith("score")])

Output

[('accuracy_score', <function accuracy_score at 0x13f2c5b80>), 
('adjusted_mutual_info_score', <function adjusted_mutual_info_score at 0x13f2dcdc0>), 
('adjusted_rand_score', <function adjusted_rand_score at 0x13f2dc790>), 
('average_precision_score', <function average_precision_score at 0x13f2bf4c0>), 
('balanced_accuracy_score', <function balanced_accuracy_score at 0x13f2d4af0>), 
('calinski_harabasz_score', <function calinski_harabasz_score at 0x13f9294c0>), 
('cohen_kappa_score', <function cohen_kappa_score at 0x13f2c5ee0>), 
('completeness_score', <function completeness_score at 0x13f2dc9d0>), 
('consensus_score', <function consensus_score at 0x13f929a60>), 
('davies_bouldin_score', <function davies_bouldin_score at 0x13f9295e0>), 
('dcg_score', <function dcg_score at 0x13f2c50d0>), 
('explained_variance_score', <function explained_variance_score at 0x13f93e040>), 
('f1_score', <function f1_score at 0x13f2d43a0>), 
('fbeta_score', <function fbeta_score at 0x13f2d44c0>), 
('fowlkes_mallows_score', <function fowlkes_mallows_score at 0x13f2e8040>), 
('homogeneity_score', <function homogeneity_score at 0x13f2dc8b0>), 
('jaccard_score', <function jaccard_score at 0x13f2d4040>), 
('label_ranking_average_precision_score', <function label_ranking_average_precision_score at 0x13f2bfb80>), 
('mutual_info_score', <function mutual_info_score at 0x13f2dcca0>), 
('ndcg_score', <function ndcg_score at 0x13f2c5280>), 
('normalized_mutual_info_score', <function normalized_mutual_info_score at 0x13f2dcee0>), 
('precision_score', <function precision_score at 0x13f2d48b0>), 
('r2_score', <function r2_score at 0x13f93e160>), 
('rand_score', <function rand_score at 0x13f2dc700>), 
('recall_score', <function recall_score at 0x13f2d49d0>), 
('roc_auc_score', <function roc_auc_score at 0x13f2bf700>), 
('silhouette_score', <function silhouette_score at 0x13f9293a0>), 
('top_k_accuracy_score', <function top_k_accuracy_score at 0x13f2c51f0>), 
('v_measure_score', <function v_measure_score at 0x13f2dcb80>)]

1reaction

lvwerracommented, Oct 11, 2022

I have some ideas on how this could be done and I’ll start drafting a PR hopefully this week and I can tag you on it @Mouhanedg56!

Top Results From Across the Web

3.3. Metrics and scoring: quantifying the quality of predictions

The sklearn.metrics module implements several loss, score, and utility functions to measure classification performance. Some metrics might require probability ...

Scikit-Learn - Hugging Face

The metrics in evaluate can be easily integrated with an Scikit-Learn estimator or pipeline. However, these metrics require that we generate the predictions ......

Evaluating and exporting scikit-learn metrics in a Keras callback

By performing metric evaluation inside of a Keras callback, we can leverage any existing metric, and ultimately export the result to TensorBoard ...

How to employ the scikit-learn evaluation metrics functions ...

My aim is to use these metrics in conjunction with the Early-Stopping method of Keras. So I should find a method to integrate...

Model Evaluation in Scikit-learn - Towards Data Science

Scikit -learn is one of the most popular Python libraries for Machine Learning. It provides models, datasets, and other useful functions. In ......