Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

User guide formulas for macro average recall and precision seem wrong

See original GitHub issue

To clarify, the calculation provided by recall_score(y_true, t_pred, average='macro') is correct, but the formula in the User Guide seems to be wrong. Here’s an example that shows that recall_score and precision_score calculate the correct numbers:

In [2]: y_true = [0, 1, 2, 0, 1, 2]

In [3]: y_pred = [0, 2, 1, 0, 0, 1]

In [4]: confusion_matrix(y_true, y_pred)
Out[4]: 
array([[2, 0, 0],
       [1, 0, 1],
       [0, 2, 0]])

In [5]: recall_score(y_true, y_pred, average='macro')
Out[5]: 0.3333333333333333

In [6]: precision_score(y_true, y_pred, average='macro')
Out[6]: 0.2222222222222222

In [14]: np.mean(np.diagonal(C) / np.sum(C, axis=1))
Out[14]: 0.3333333333333333

In [15]: np.mean(np.diagonal(C) / np.sum(C, axis=0))
Out[15]: 0.2222222222222222

However, this is what the User Guide gives for the precision and recall formulas:

So, the user guide is saying that recall is calculated for each class, l, by dividing by the number of samples with predicted label l, but it should be dividing by the number of samples with true label l, and vice versa for the precision definition.

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:12 (8 by maintainers)

Top GitHub Comments

2reactions

jnothmancommented, Dec 19, 2020

I think the confusing notation should be fixed, and where possible we should make the functions take arguments true then predicted.

I believe strongly that precision and recall should be defined in terms of set notation. It makes it much easier to reason about variants of the metric (e.g. micro averages in multiclass classification where some class is considered the majority class to be ignored; or the usages of precision and recall in information retrieval and information extraction where you can’t presume to be able to count the total number of items being considered for prediction). It also makes the equivalence between F1 and dice coefficient more apparent

1reaction

Rubix982commented, Nov 21, 2020

Hi, guys! I have a paper starting from next Tuesday and kind of busy this weekend preparation. Will be back next week after my papers end. Hope you don’t mind, @CameronBieganek 😄