Possible bug with calculation of jaccard similarity
See original GitHub issueFirstly, thanks a lot for this wonderful library and also for adding transformers in the latest release.
Describe the bug I trained a model for multi-label classification. After training, when I compute the jaccard similarity on predictions on test set, by calculating intersection over union, the score does not match with the results given by the library. test_statistics reported jaccard of 0.6661, but I calculated it to be 0.5587. This issue was not there in the previous version.
To Reproduce Steps to reproduce the behavior:
training:
epochs: 1
validation_field: labels
validation_measure: jaccard
input_features:
-
name: comment_text
type: sequence
sequence_length_limit: 128
representation: dense
lowercase: true
embedding_size: 256
cell_type: lstm
reduce_output: null
num_layers: 1
bidirectional: true
output_features:
-
name: labels
type: set
validation_field: jaccard_index
as model_definition.yaml
-
Run
ludwig experiment -rs 42 --training_set data/train_data.csv \ --test_set data/test_data.csv --data_format csv -cf model_definition.yaml
-
After one epoch, in order to calculate the jaccard score, run
import pandas as pd
import csv
def compute_jaccard():
# read test file
df_test = pd.read_csv("data/test_data.csv")
true_labels = list(df_test["labels"])
# read predicted labels
pred_labels = []
with open("results/experiment_run/labels_predictions.csv") as csvfile:
label_reader = csv.reader(csvfile, delimiter=',')
for row in label_reader:
all_sectors = ' '.join(row)
pred_labels.append(all_sectors)
# compute jaccard similiarity
list_jaccard = []
for str_true, str_pred in zip(true_labels, pred_labels):
set_true = set(str_true.split())
set_pred = set(str_pred.split())
tp = len(set_true.intersection(set_pred))
union = len(set_true.union(set_pred))
list_jaccard.append(tp / union)
jaccard = sum(list_jaccard) / len(list_jaccard)
return jaccard
print(compute_jaccard())
Environment :
- OS: MacOS Mojave
- Version: 10.14.16
- Python version: 3.7.3
- Ludwig version: 0.3
Additional context I used the same logic to calculate jaccard with ludwig version 0.2.2.8, but this issue is not there.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5
Top GitHub Comments
I checked and it’s giving correct results. Thank you very much for this quick fix! 😃
@jenishah Thank you for providing complete and detailed description of the issue. I was able to use the code and data you provided to reproduce the issue on my side. I’m still in the process of looking into the root cause. One thing I can relate. Instead of manually calculating the metric, Ludwig is using the keras metric class
MeanIOU
in v0.3. Right now I’m looking to assess how the difference comes about. This may take me a day or two for this work.