question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Entity extracted at evaluation doesn't show up using the imported model

See original GitHub issue

Hi,

Trained a custom NER model on a our own labelled dataset on financial risk. Did pretraining as well and the model finished with a score of 97.48.

Training pipeline: ner
Starting with blank model 'en'
2511 training docs
267 evaluation docs

============================== Vocab & Vectors ==============================
ℹ 101601 total words in the data (12951 unique)
ℹ No word vectors present in the model

========================== Named Entity Recognition ==========================
ℹ 1 new label, 0 existing labels
0 missing values (tokens with '-' label)
✔ Good amount of examples for all labels
✔ Examples without occurrences available for all labels
✔ No entities consisting of or starting/ending with whitespace
✔ No entities consisting of or starting/ending with punctuation

When evaluating the model on the dev set, entities got picked up just fine, but there is one entity: US Treasury Department’s Office of Foreign Assets Control (at least one I’ve notices) that doesn’t show up in the same sentences when importing and testing the best-model in a notebook:

Screen Shot 2020-08-24 at 08 26 29 Screen Shot 2020-08-24 at 08 27 00

Screen Shot 2020-08-24 at 08 29 03

Than I ran a test on every single sentence (150) containing the missing entity, 3 returned it partially as: Department’s Office of Foreign Assets Control but nothing more.

There are quite a few other entities like: Department of Justice (130), Department of State (70), US Department of the Treasury (40) which contain similar wording, can these potentially conflict the missing entity: US Treasury Department’s Office of Foreign Assets Control? However this still won’t answer why this is present in the evaluation sample but missing in production.

Btw, there’s a permutation of the missing entity: US Treasury’s Office of Foreign Assets Control which pops up perfectly in any tested sentence, which puzzles me even more.

Using latest version of Spacy. Thanks.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:18 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
vedtamcommented, Oct 13, 2020

Hi @svlandeg, I appreciate your help so much! Yes, this makes sense and will help us in preparing our data in such a manner that’s consistent from training to production. Can’t wait to dive in and do some refactoring.

Thanks again!

1reaction
svlandegcommented, Sep 2, 2020

Yes, I received it, thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Custom NER model does not recognize other entities #3528
However, the nlp('test_data') can no longer recognize the original Entities. For example, it can no longer recognize DATE, PERSON, ORG, etc.
Read more >
Named-Entity evaluation metrics based on entity-level
When you train a NER system the most typically evaluation method is to measure precision, recall and f1-score at a token level.
Read more >
Evaluate and iterate AutoML text entity extraction models
This page shows you how to evaluate your AutoML text entity extraction models so that you can iterate on your model. Vertex AI...
Read more >
Building an entity extraction model using BERT - YouTube
In this video, I will show you how to build an entity extraction model using #BERT model. I will be using huggingface's transformers...
Read more >
Text Entity Extraction with AutoML Natural Language - YouTube
This video shows you how to build a custom entity extraction model with Google Cloud AutoML Natural Language.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found