Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature: standardize inputs/outputs of metrics

See original GitHub issue

Currently there are several different inputs/output formats possible in Metrics. We should standardize them as much as possible and respecting the following principle:

inputs/outputs are easy to understand and use
outputs are compatible with other frameworks

For the output standardization: probably a dictionary structure, even if nested would be ok. Also a dedicated output class could be considered like in transformer models but this is probably not necessary here. To make it compatible with e.g. keras we could add a postprocess function at initialization similar a transform in datasets.

There are three options we could implement:

load_metric(..., postprocess="metric_key") # equivalent result to `metric.compute()["metric_key"]`
load_metric(..., postprocess="flatten") # equivalent flattening the output dict: `flatten(metric.compute())`
load_metric(..., postprocess=func) # equivalent result to `func(metric.compute())`

Issue Analytics

State:
Created a year ago
Comments:27 (26 by maintainers)

Top GitHub Comments

2reactions

sashavorcommented, Apr 8, 2022

As per our meeting today, we proposed to have standardized structure for inputs, in dictionary form.

An initial proposal of that structure can be:


     {
        "references":  ,
        "predictions": ,
     }

With references and predictions being of any format (strings, images, numbers, vectors, etc.). I was looking at examples of computer vision metrics and this should work for those as well.

Edge cases:

COMET, WikiSplit and SARI – take an additional input, sources
F1, Precision and Recall – require average parameter for multiclass labels, but we could define a default if needed
Perplexity – needs an input string and a model

I think we could have additional, optional, source and average inputs, but I don’t really know what to so for perplexity 😅 (I mean, in any case, the metrics will not function without these arguments, but I guess waiting for them to crash isn’t the best solution)

CC @lvwerra @lhoestq @apsdehal

1reaction

lvwerracommented, May 19, 2022

I have been thinking about the scalar vs. dict question. Having a dict across all metrics at least internally is nice as it allows to treat them all the same way and we can also combine metrics by merging dicts. At the same time we could check if the return is just a dict with one value and if that’s the case just return its. value.

metric = evaluate.load("accuracy") 
>>> 0.6

metric = evaluate.load("accuracy",  force_dict=True)
>>> {"accuracy": 0.6}

What do you think @lhoestq?

Regarding Keras, I’ll think a bit more about how to do that smoothly.

Top Results From Across the Web

Feature Scaling | Standardization Vs Normalization

What is Standardization? Standardization is another scaling technique where the values are centered around the mean with a unit standard ...

All about Feature Scaling

The most common techniques of feature scaling are Normalization and Standardization.

How to use Data Scaling Improve Deep Learning Model ...

Data scaling can be achieved by normalizing or standardizing real-valued input and output variables. How to apply standardization and ...

When and Why to Standardize Your Data

Standardization comes into the picture when features of the input data set have large differences between their ranges, or simply when they are ......

How and why do normalization and feature scaling work?

Standardization is the central preprocessing step in data mining, to standardize values of features or attributes from different dynamic range into a specific ......