Confidence Score for NER
See original GitHub issueIt is important for my current project to output the confidence score for each entity we get from NER. Doing my research, I’ve found out some issues regarding that. Those were particulary usefull for me: https://github.com/explosion/spaCy/issues/3996, https://github.com/explosion/spaCy/issues/881, https://github.com/explosion/spaCy/issues/831 They presented a method by our great honnibal that calculates the confidences from the spaCy NER extractions.
HOWEVER, after updating my en_core_web_md
model to version 2.3.0
, the confidences I got for a lot of my NER extractions are 1.0
or 0.0
(before updating the model, I used to see all types of scores, ranging from 0.666...
to 0.999...
). Something seems just off… I wonder if I can trust those values. I also wonder how, internally, spaCy decides if a NER entity is acceptable or not considering its confidence (or if it taken into consideration at all).
Here is an snippet of my project where I get the confidences:
nlp = spacy.load('en_core_web_md')
with nlp.disable_pipes('ner'):
doc = nlp(text)
beams = nlp.entity.beam_parse([ doc ], beam_width = 16, beam_density = 0.0001)
entity_scores = defaultdict(float)
for beam in beams:
for score, ents in nlp.entity.moves.get_beam_parses(beam):
for start, end, label in ents:
entity_scores[(start, label)] += score
Am I doing anything incredibly wrong here?
PS: I wonder if we are going to have an easy access to a confidence score from the entities in the near future? That would be great.
Environment
- Operating System: Windows 10 (but I’m also executing the code inside a container that runs Ubuntu and got the same issue)
- Python Version Used: 3.7.7
- spaCy Version Used: 2.3.0
Issue Analytics
- State:
- Created 3 years ago
- Reactions:6
- Comments:21 (11 by maintainers)
Top GitHub Comments
An important related PR (WIP) is https://github.com/explosion/spaCy/pull/6747:
This PR is currently being held up because of another task we’re working on that slighly redesigns how the pipeline components set updates during training (internals only). We’re aiming to get this done by 3.1 though (but no promises 😉)
Hello @honnibal, @ines, and @svlandeg, apologies for the noise, but I have noticed that you have been discussing this issue, resolving/closing other related threads, and linking back to here or elsewhere. I figured it would help to organize the discussion a little more and define the feature request as well as use case (plus make some noise so it’s clear how valuable this would be!).
Disclaimer: if you already are actively working on this feature, then this post is merely a testament to the lack of visibility into the fact that this work is taking place, and the feature request can be ignored 😃.
Problem Statement
The problem upfront is that (1) our team requires a robust way to return confidence scores for custom NER models, (2) spaCy does not seem to support this out of the box, and (3) the discussions for how to implement an effective solution are distributed over a number of threads, making the “docs” on this difficult to traverse and it hard to tell what the best or recommended approach is.
Use Case
We are beginning to train custom NER models and use them at scale, but multiple downstream applications have requested confidence scores for the NER model’s predictions in order to rank results and select/filter by highest probability or confidence score. Example downstream apps: search/recommendation systems, and AI planning systems which depend on a spaCy-powered information extraction service. Without a dedicated spaCy feature that does this, we’ve resorted to investigating the beam-parse approach, and suggesting that downstream apps use method stubs in the short-term, but we’re still not sure of the right direction. Alternatives proposed: separate regression models that predict a score (very meta! but confusing/indirect/potentially-ill-advised).
Feature Request
A spaCy-supported solution presented as a roadmap item for a future release of spaCy containing a supported method for scoring NER predictions by level of confidence, and/or a link to official public documentation for how to use spaCy functions/classes in order to score predictions.
I specifically call out the public documentation because these threads about using beam-parse are difficult to read and search through, and many solutions appear to be out of date. Perhaps a short-term workaround to the missing public docs could be to present or link to the latest and greatest method (beam-parse?) for scoring NER model predictions here in this thread or in a gist that gets updated periodically.
Breadcrumb Trail
Here’s the trail of issues containing useful information about this topic, and a brief description of relevant information for each issue:
(I think it’s worth collecting these somewhere, mostly for my own reference and so other users can trace the breadcrumb trail, since the replies in these threads contain relevant context as well as implementation details for the beam-parse approach not mentioned in spaCy’s documentation site, but also for the authors of spaCy so that they may see the path users like myself are taking and how it might be easy to get confused.)
Unrelated to confidence scores: Thanks for developing and open-sourcing a great library, which is helping our team to resolve issues and collect information for patients more quickly and at a lower cost.