What is the F1-score of your trained NEL model on dev data?
See original GitHub issueThank you very much for your open source tools! What is the F1-score of your trained NEL model?How to evaluate its performance on own articles like non-Wikipedia articles? Is there a demo page to show NEL tools’ performance?
Your Environment
- Operating System:
- Python Version Used:3.6
- spaCy Version Used:2.2.5
- Environment Information:
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
The F1 score | Towards Data Science
In this article, you will discover the F1 score. The F1 score is a machine learning metric that can be used in classification...
Read more >Predicting with My Model: Is It Safe? - BigML's blog
The official term is that you have 'trained a Model' and the data you used is called your training data. To evaluate how...
Read more >How to Calculate Precision, Recall, F1, and More for Deep ...
How can I calculate the F1-score or confusion matrix for my model? ... one for training the model and one for evaluating the...
Read more >An Optimized Deep Learning Model for Image Classification ...
It empowers developers to test their modeling efforts rapidly ... of implementing a model of AI is data preparation, training,.
Read more >Precision/Recall/F1-score results for gene mention over ...
Download Table | Precision/Recall/F1-score results for gene mention over CRAFT development set: LingPipe with distributed model trained on GeneTag using ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @whuFSN : we don’t have a single pretrained NEL model yet, as the algorithms are still under development. This is also the reason why we’re not distributing one yet. Measuring the accuracy is actually a bit of a challenge: Wikipedia data is highly biased in its annotations, and we feel that measuring performance on it does not really reflect generalizability of the model.
I recently trained an NEL algorithm on 165K Wikipedia articles using this code, and manually annotated a set of news articles with Prodigy. The evaluation set contained 360 links (which is not too much).
This was the performance I got:
The “prior probability” baseline is very strong, it indicates how likely a mention resolves to a certain ID, WITHOUT taking into account any context (i.e. any mention of “Obama” is most likely to be the former US president, even though surely there are many more Obama’s in the world). If we combine the EL algorithm (which looks at context of the sentence) with the prior probability, we see a bump in performance from 67% to 71%.
Because not every entity of the world could ever realistically be included in a knowledge base, your upper bound is always less than 100%. In this evaluation, about 15% of entities in the news could not be resolved to the KB (think e.g. about random local people who are mentioned in the news exactly once, and you wouldn’t even be able to find them on Google).
@whuFSN : to reduce the training time, look at this thread with some advice I wrote to increase speed.
@umarbutler : the current CLI train scripts do not yet support the NEL task, so there is no
meta.json
when you train them. For a simple example on how to train a NEL algorithm, see this script.This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.