Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

What is the F1-score of your trained NEL model on dev data？

See original GitHub issue

Thank you very much for your open source tools! What is the F1-score of your trained NEL model？How to evaluate its performance on own articles like non-Wikipedia articles? Is there a demo page to show NEL tools’ performance?

Your Environment

Operating System:
Python Version Used:3.6
spaCy Version Used:2.2.5
Environment Information:

Issue Analytics

State:
Created 3 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

2reactions

svlandegcommented, Apr 13, 2020

Hi @whuFSN : we don’t have a single pretrained NEL model yet, as the algorithms are still under development. This is also the reason why we’re not distributing one yet. Measuring the accuracy is actually a bit of a challenge: Wikipedia data is highly biased in its annotations, and we feel that measuring performance on it does not really reflect generalizability of the model.

I recently trained an NEL algorithm on 165K Wikipedia articles using this code, and manually annotated a set of news articles with Prodigy. The evaluation set contained 360 links (which is not too much).

This was the performance I got:

The “prior probability” baseline is very strong, it indicates how likely a mention resolves to a certain ID, WITHOUT taking into account any context (i.e. any mention of “Obama” is most likely to be the former US president, even though surely there are many more Obama’s in the world). If we combine the EL algorithm (which looks at context of the sentence) with the prior probability, we see a bump in performance from 67% to 71%.

Because not every entity of the world could ever realistically be included in a knowledge base, your upper bound is always less than 100%. In this evaluation, about 15% of entities in the news could not be resolved to the KB (think e.g. about random local people who are mentioned in the news exactly once, and you wouldn’t even be able to find them on Google).

@whuFSN : to reduce the training time, look at this thread with some advice I wrote to increase speed.

@umarbutler : the current CLI train scripts do not yet support the NEL task, so there is no meta.json when you train them. For a simple example on how to train a NEL algorithm, see this script.

0reactions

lock[bot]commented, May 20, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Top Results From Across the Web

The F1 score | Towards Data Science

In this article, you will discover the F1 score. The F1 score is a machine learning metric that can be used in classification...

Predicting with My Model: Is It Safe? - BigML's blog

The official term is that you have 'trained a Model' and the data you used is called your training data. To evaluate how...

How to Calculate Precision, Recall, F1, and More for Deep ...

How can I calculate the F1-score or confusion matrix for my model? ... one for training the model and one for evaluating the...

An Optimized Deep Learning Model for Image Classification ...

It empowers developers to test their modeling efforts rapidly ... of implementing a model of AI is data preparation, training,.

Precision/Recall/F1-score results for gene mention over ...

Download Table | Precision/Recall/F1-score results for gene mention over CRAFT development set: LingPipe with distributed model trained on GeneTag using ...