Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Confidence Score for NER

See original GitHub issue

It is important for my current project to output the confidence score for each entity we get from NER. Doing my research, I’ve found out some issues regarding that. Those were particulary usefull for me: https://github.com/explosion/spaCy/issues/3996, https://github.com/explosion/spaCy/issues/881, https://github.com/explosion/spaCy/issues/831 They presented a method by our great honnibal that calculates the confidences from the spaCy NER extractions.

HOWEVER, after updating my en_core_web_md model to version 2.3.0, the confidences I got for a lot of my NER extractions are 1.0 or 0.0 (before updating the model, I used to see all types of scores, ranging from 0.666... to 0.999... ). Something seems just off… I wonder if I can trust those values. I also wonder how, internally, spaCy decides if a NER entity is acceptable or not considering its confidence (or if it taken into consideration at all). Here is an snippet of my project where I get the confidences:


nlp = spacy.load('en_core_web_md')
with nlp.disable_pipes('ner'):
    doc = nlp(text)

beams = nlp.entity.beam_parse([ doc ], beam_width = 16, beam_density = 0.0001)

entity_scores = defaultdict(float)
    for beam in beams:
        for score, ents in nlp.entity.moves.get_beam_parses(beam):
            for start, end, label in ents:
                entity_scores[(start, label)] += score

Am I doing anything incredibly wrong here?

PS: I wonder if we are going to have an easy access to a confidence score from the entities in the near future? That would be great.

Environment

Operating System: Windows 10 (but I’m also executing the code inside a container that runs Ubuntu and got the same issue)
Python Version Used: 3.7.7
spaCy Version Used: 2.3.0

Issue Analytics

State:
Created 3 years ago
Reactions:6
Comments:21 (11 by maintainers)

Top GitHub Comments

7reactions

svlandegcommented, Mar 30, 2021

An important related PR (WIP) is https://github.com/explosion/spaCy/pull/6747:

The SpanCategorizer component (…) takes a function that proposes (start, end) offsets in the documents as possible spans to be recognized … and … assigns label probabilities to each span

Another really nice thing about this component is that it predicts directly over the spans, so we’ll finally have a component that can give meaningful entity confidences 🎉

This PR is currently being held up because of another task we’re working on that slighly redesigns how the pipeline components set updates during training (internals only). We’re aiming to get this done by 3.1 though (but no promises 😉)

5reactions

lucinvitaecommented, Mar 30, 2021

Hello @honnibal, @ines, and @svlandeg, apologies for the noise, but I have noticed that you have been discussing this issue, resolving/closing other related threads, and linking back to here or elsewhere. I figured it would help to organize the discussion a little more and define the feature request as well as use case (plus make some noise so it’s clear how valuable this would be!).

Disclaimer: if you already are actively working on this feature, then this post is merely a testament to the lack of visibility into the fact that this work is taking place, and the feature request can be ignored 😃.

Problem Statement

The problem upfront is that (1) our team requires a robust way to return confidence scores for custom NER models, (2) spaCy does not seem to support this out of the box, and (3) the discussions for how to implement an effective solution are distributed over a number of threads, making the “docs” on this difficult to traverse and it hard to tell what the best or recommended approach is.

Use Case

We are beginning to train custom NER models and use them at scale, but multiple downstream applications have requested confidence scores for the NER model’s predictions in order to rank results and select/filter by highest probability or confidence score. Example downstream apps: search/recommendation systems, and AI planning systems which depend on a spaCy-powered information extraction service. Without a dedicated spaCy feature that does this, we’ve resorted to investigating the beam-parse approach, and suggesting that downstream apps use method stubs in the short-term, but we’re still not sure of the right direction. Alternatives proposed: separate regression models that predict a score (very meta! but confusing/indirect/potentially-ill-advised).

Feature Request

A spaCy-supported solution presented as a roadmap item for a future release of spaCy containing a supported method for scoring NER predictions by level of confidence, and/or a link to official public documentation for how to use spaCy functions/classes in order to score predictions.

I specifically call out the public documentation because these threads about using beam-parse are difficult to read and search through, and many solutions appear to be out of date. Perhaps a short-term workaround to the missing public docs could be to present or link to the latest and greatest method (beam-parse?) for scoring NER model predictions here in this thread or in a gist that gets updated periodically.

Breadcrumb Trail

Here’s the trail of issues containing useful information about this topic, and a brief description of relevant information for each issue:

https://github.com/explosion/spaCy/issues/831 (in which @honnibal links to https://github.com/explosion/spaCy/issues/881 and @svlandeg links to the current issue, 5971)
https://github.com/explosion/spaCy/issues/881 (in which @honnibal defines the issue and presents a solution 4 years ago, then closes the issue)
https://github.com/explosion/spaCy/issues/4450 (in which @adrianeboyd links to a method in the spaCy codebase in response to a user request for a more stable way to train NER models while supporting confidence scores)
https://github.com/explosion/spaCy/issues/6644 (in which @svlandeg discusses that scoring NER confidence is a work in progress, but there’s little visibility to where it’s being tracked)
https://github.com/RasaHQ/rasa/issues/676 (not a spaCy thread, but @honnibal mentions the beam-parse method here)
https://support.prodi.gy/t/displaying-a-confidence-score-next-to-a-user-defined-entity/403 (prodigy thread, but @honnibal mentions a beam-parse method there)

(I think it’s worth collecting these somewhere, mostly for my own reference and so other users can trace the breadcrumb trail, since the replies in these threads contain relevant context as well as implementation details for the beam-parse approach not mentioned in spaCy’s documentation site, but also for the authors of spaCy so that they may see the path users like myself are taking and how it might be easy to get confused.)

Unrelated to confidence scores: Thanks for developing and open-sourcing a great library, which is helping our team to resolve issues and collect information for patients more quickly and at a lower cost.