Defining and standardizing metric input structures
See original GitHub issueDocumenting the types of inputs and data structures used for each metric
predictions = Sequence(Value("int32"))
references = Sequence(Value("int32"))
or if "multilabel"
mode:
predictions= Value("int32")
references = Value("int32")
predictions = Value("string", id="sequence")
references = Sequence(Value("string", id="sequence"), id="references")
predictions = Sequence(Value("string", id="token"), id="sequence")
references = Sequence(Value("string", id="token"), id="sequence"), id="references"
predictions = Value("string", id="sequence")
references = Value("string", id="sequence")
predictions = Value("string", id="sequence")
references = Value("string", id="sequence")
predictions = Value("string", id="sequence")
references = Sequence(Value("string", id="sequence"), id="references")
predictions = Sequence(Value("string"))
references = Value("string")
sources = Value("string", id="sequence")
predictions = Value("string", id="sequence")
references = Value("string", id="sequence")
predictions = Value("string")
references = Value("string")
predictions = Sequence(Value("string"))
references = Sequence(Value("string"))
N.B. The sentences have to be in CoNLL format, which may be tricky to handle in some cases
"predictions": {
"id": Value("string"),
"prediction_text": Sequence(Value("string")),
}
"references": {
"id": Value("string"),
"answers": Sequence(
{
"text": Value("string"),
"answer_start": Value("int32"),
}
),
},
}
predictions = Value("string", id="sequence")
references = Value("string", id="sequence")
predictions = Sequence(Value("int32")
references = Sequence(Value("int32"))
references = Value("string")
predictions = Value("string")
predictions = Sequence(Value("string", id="token"), id="sequence")
references = Sequence(Sequence(Value("string", id="token"), id="sequence"), id="references")
predictions = Value("int64" if self.config_name != "stsb" else "float32")
references = Value("int64" if self.config_name != "stsb" else "float32")
The type of input depends on the GLUE subset used.
predictions = Sequence(Value("string", id="token"), id="sequence")
references = Sequence(Sequence(Value("string", id="token"), id="sequence"), id="references")
predictions = Value("int64") if self.config_name != "cvit-mkb-clsr" else Sequence(Value("float32"))
references = Value("int64") if self.config_name != "cvit-mkb-clsr" else Sequence(Value("float32"))
predictions = Value("float")
references = Value("float")
or if multilist
:
predictions = Sequence(Value("float"))
references = Sequence(Value("float"))
"X": Sequence(Value("float", id="sequence"), id="X")
reference_distribution = np.array(reference_distribution)
N.B. the names for references
and predictions
are different here – maybe we should standardize? wdyt @lhoestq
predictions = Value("int32")
references = Value("int32")
predictions = Value("string", id="sequence")
references = Value("string", id="sequence")
predictions = Sequence(Sequence(Value("uint16")))
references = Sequence(Sequence(Value("uint16")))
What’s a unit16
? unicode? this is the only metric with a unicode restriction (so far).
predictions = Value("string", id="sequence")
references = Value("string", id="sequence")
predictions = Value("float")
references = Value("float")
or if multilist
:
predictions = Sequence(Value("float"))
references = Sequence(Value("float")),
references = Value("float")
predictions = Value("float")
input_texts = Value("string")
predictions = Value("int32")
references = Value("int32")
or if multilist
:
predictions = Sequence(Value("int32"))
references = Sequence(Value("int32"))
predictions = Value("int32")
references = Value("int32")
or if multilist
:
predictions = Sequence(Value("int32"))
references = Sequence(Value("int32"))
predictions = Value("string", id="sequence")
references = Value("string", id="sequence")
predictions = Value("string", id="sequence")
references = Sequence(Value("string", id="sequence"), id="references")
sources = Value("string", id="sequence")
predictions = Value("string", id="sequence")
references = Sequence(Value("string", id="sequence"), id="references")
predictions = Sequence(Value("string", id="label"), id="sequence")
references = Sequence(Value("string", id="label"), id="sequence")
N.B. both predictions
and references
are in IOB format
predictions = Value("float")
references = Value("float")
predictions = {"id": Value("string"), "prediction_text": Value("string")}
"references": {
"id": Value("string"),
"answers": features.Sequence(
{
"text": Value("string"),
"answer_start": Value("int32"),
}
)
"predictions": {
"id": Value("string"),
"prediction_text": Value("string"),
"no_answer_probability": Value("float32"),
}
"references": {
"id": Value("string"),
"answers": features.Sequence(
{"text": Value("string"), "answer_start": Value("int32")}
),
}
N.B. SQuAD and SQuAD v2. formats differ in the fact that v2 has the 'no_answer_probability'
tag in predictions
.
if self.config_name == "record":
return {
"predictions": {
"idx": {
"passage": Value("int64"),
"query": Value("int64"),
},
"prediction_text": Value("string"),
},
"references": {
"idx": {
"passage": Value("int64"),
"query": Value("int64"),
},
"answers": Sequence(datasets.Value("string")),
},
}
elif self.config_name == "multirc":
return {
"predictions": {
"idx": {
"answer": Value("int64"),
"paragraph": Value("int64"),
"question": Value("int64"),
},
"prediction": Value("int64"),
},
"references": Value("int64"),
}
else:
return {
"predictions": Value("int64"),
"references": Value("int64"),
}
predictions = Value("string", id="sequence")
references = Sequence(Value("string", id="sequence"), id="references")
predictions = Value("string", id="sequence"),
references = Value("string", id="sequence")
predictions = Value("string", id="sequence")
references = Sequence(Value("string", id="sequence"), id="references")
predictions = Value("int64" if self.config_name != "sts-b" else "float32")
references = Value("int64" if self.config_name != "sts-b" else "float32")
pred_type = "int64" if self.config_name in ["fleurs-lang_id", "minds14"] else "string"
predictions = Value(pred_type)
references = Value(pred_type)
N.B. the input depends on the XTREME-S dataset selected
Issue Analytics
- State:
- Created a year ago
- Comments:19 (19 by maintainers)
Top Results From Across the Web
Quick Tips for Defining Business Performance Metrics
How to create the right metrics and standardizing their meaning, including a useful template to collect and analyze your business performance metrics.
Read more >METRIC DESIGN GUIDE - GSA.gov
Pub. L. 100-418 designated the metric system as the preferred system of weights and measures for U.S. trade and commerce. This law also...
Read more >C Metric Unit Standardization - Oracle Help Center
Unit Category Unit Code Unit Display Unit NLS ID
BOOLEAN BOOLEAN boolean EM_SYS_STANDARD_BOOLEAN_BOOLEAN
COUNT NA n/a EM_SYS_STANDARD_COUNT_NA
DATA_SIZE BLOCK blocks EM_SYS_STANDARD_DATASIZE_BLOCK
Read more >Standardizing Performance Metrics for Building-Level ... - NREL
The performance metrics developed here, along with well-defined boundaries, can be used to compare commercial and residential buildings. In the ...
Read more >A guide to standardized business processes, data, and ...
Standardization is any process used to develop and implement metrics (i.e., “standards”) that specify essential characteristics of something whose control and ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@lhoestq indeed the data is cast there, however, when deactivating it is still cast later on by
pyarrow.array
(see docs). The main issue is string types as everything can be cast to a string.Ideally we would have a mechanism that checks all types but this could be quite extensive. So maybe to start we could just check if something that should be a string is really a string. What do you think?
Indeed currently it tries to cast the type here:
https://github.com/huggingface/evaluate/blob/df3d20712df202b586f73cf45a66b65652e45d5b/src/evaluate/metric.py#L464
You can try removing this line, it should fix this issue 😃