Entity extracted at evaluation doesn't show up using the imported model
See original GitHub issueHi,
Trained a custom NER model on a our own labelled dataset on financial risk. Did pretraining as well and the model finished with a score of 97.48.
Training pipeline: ner
Starting with blank model 'en'
2511 training docs
267 evaluation docs
============================== Vocab & Vectors ==============================
ℹ 101601 total words in the data (12951 unique)
ℹ No word vectors present in the model
========================== Named Entity Recognition ==========================
ℹ 1 new label, 0 existing labels
0 missing values (tokens with '-' label)
✔ Good amount of examples for all labels
✔ Examples without occurrences available for all labels
✔ No entities consisting of or starting/ending with whitespace
✔ No entities consisting of or starting/ending with punctuation
When evaluating the model on the dev set, entities got picked up just fine, but there is one entity: US Treasury Department’s Office of Foreign Assets Control
(at least one I’ve notices) that doesn’t show up in the same sentences when importing and testing the best-model in a notebook:
Than I ran a test on every single sentence (150) containing the missing entity, 3 returned it partially as: Department’s Office of Foreign Assets Control
but nothing more.
There are quite a few other entities like: Department of Justice (130), Department of State (70), US Department of the Treasury (40) which contain similar wording, can these potentially conflict the missing entity: US Treasury Department’s Office of Foreign Assets Control
? However this still won’t answer why this is present in the evaluation sample but missing in production.
Btw, there’s a permutation of the missing entity: US Treasury’s Office of Foreign Assets Control
which pops up perfectly in any tested sentence, which puzzles me even more.
Using latest version of Spacy. Thanks.
Issue Analytics
- State:
- Created 3 years ago
- Comments:18 (9 by maintainers)
Top GitHub Comments
Hi @svlandeg, I appreciate your help so much! Yes, this makes sense and will help us in preparing our data in such a manner that’s consistent from training to production. Can’t wait to dive in and do some refactoring.
Thanks again!
Yes, I received it, thanks!