spacy ner scorer return all zeros and duplicate labels
See original GitHub issueHi all,
I am trying to get scores on my test set using Scorer. Here is my simple code:
test_set = [
('Your 40577 is finished', [(6,11,'LABEL1')] ),
('Finished with SODE20915', [(14,23,'LABEL2')] )
]
scorer = Scorer()
for text,annotin test_set:
doc_gold_text = nlp(text) # here i tried also: doc_gold_text = nlp.make_doc(txt)
gold = GoldParse(doc_gold_text, entities=annotin)
pred_value = nlp(txt)
#print(gold.words, gold.tags, gold.labels, gold.ner)
scorer.score(pred_value, gold)
print(scorer.scores)
And what I am getting as output is all zeros and duplicate LABEL1 as ‘LABEL1’ and “‘LABEL1’”:
{'uas': 0.0, 'las': 0.0, 'las_per_type': {'': {'p': 0.0, 'r': 0.0, 'f': 0.0}}, 'ents_p': 0.0, 'ents_r': 0.0, 'ents_f': 0.0, 'ents_per_type': {"'LABEL1'": {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'LABEL1': {'p': 0.0, 'r': 0.0, 'f': 0.0},"'LABEL2'": {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'LABEL2': {'p': 0.0, 'r': 0.0, 'f': 0.0}}, 'tags_acc': 0.0, 'token_acc': 100.0, 'textcat_score': 0.0, 'textcats_per_cat': {}}
I don’t understand why I have duplicate labels? And model made correct predictions in both sentences, but I have P,R,F1 scores equal to 0? Maybe i missed something?
When I print GoldParse object (uncomment print line in the loop) I am getting this:
['Your', '40577','is','finished'] [None, None,None,None] [None, None,None,None] ['O', 'U-CNTNUM','O','O']
['Finished ', 'with', '41735'] [None, None, None] [None, None, None] ['O', 'O', 'U-CNTNUM']
I checked my train data it seems ok. Here is example of train data:
[('This paper BIME 13935 will be done soon', {'entities': [(11, 21, 'LABEL2')]}),
('What is status of 43391', {'entities': [(18, 23, 'LABEL1')]})]
Model is trained on GPU using minibatches.
Your Environment
- Operating System: Linux Ubuntu
- Python Version Used: python 3.7
- spaCy Version Used: spacy 2.2.4
- Environment Information: GPU
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (5 by maintainers)
Top GitHub Comments
Oh I missed that, now it works. Great, thank you!
Thanks, a full example makes this much easier to troubleshoot!
This may not be the same problem as in your original code where you don’t have the same mistake, but in the example above the mistake is here:
You’re giving the scorer the blank tokens-only document instead of the annotated one. You need to have:
With this change you can see non-zero scores for the demo example: