Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Missing identification results

See original GitHub issue

Describe the bug I have a custom entity A which with the regex ‘AAA’, and another custom entity B with the regex ‘BBB’ … custom entity D with the regex ‘DDD’. so my code like:

analyzer.registry.add_recognizer(PatternRecognizer(supported_entity='A', patterns=[Pattern('A', 'AAA', score=0.4)]))
analyzer.registry.add_recognizer(PatternRecognizer(supported_entity='B', patterns=[Pattern('B', 'BBB', score=0.4)]))
analyzer.registry.add_recognizer(PatternRecognizer(supported_entity='C', patterns=[Pattern('C', 'CCC', score=0.4)]))
analyzer.registry.add_recognizer(PatternRecognizer(supported_entity='D', patterns=[Pattern('D', 'DDD', score=0.4)]))
analyzer.analyze(text, 'en', score_threshold=0.5)

my problem is : I’m sure the text contains something that can be detected, even I only add one PatternRecognizer it’s work. But when I add multiple instances of PatternRecognizer, I can’t find the contents of the added PatternRecognizer as in the result.

Suppose I replace PatternRecognizer with CustomRecognizer, even if the analyzer.registry contains multiple instances of CustomRecognizer.

CustomRecongizer like:

class RegexRecognizer(EntityRecognizer):
    expected_confidence_level = 0.4

    def __init__(self, regex, entity, **kwargs):
        kwargs['supported_entities'] = [entity]
        kwargs['supported_language'] = 'zh'
        super().__init__(**kwargs)
        self.regex = regex
        self.entity = entity

    def load(self) -> None:
        pass

    def analyze(self, text: str, entities, nlp_artifacts):
        try:
            regex = re.compile(self.regex)
        except Exception:
            return []
        else:
            return [
                RecognizerResult(entity_type=self.entity,
                                 start=re_result.regs[0][0],
                                 end=re_result.regs[0][1],
                                 score=self.expected_confidence_level,
                                 analysis_explanation=AnalysisExplanation(self.entity, self.expected_confidence_level))
                for re_result in regex.finditer(text)
            ]

My NLP model is stanza, and every PatternRecognizer with different context. with the same text, if only recognizer:

{'港澳通行证': [{'content': 'CA0000001', 'end': 1339, 'score': 0.75, 'start': 1330},
           {'content': 'C00007565', 'end': 1349, 'score': 0.75, 'start': 1340}]}

there are many recognizer :

'港澳通行证': [{'content': 'CA0000001', 'end': 1339, 'score': 0.4, 'start': 1330},
           {'content': 'C00007565', 'end': 1349, 'score': 0.4, 'start': 1340}],

Issue Analytics

State:
Created a year ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

omri374commented, Jul 5, 2022

Yes, that’s exactly it. The bug arises from the fact that all recognizers have the same name. Passing a name would definitely fix it, but we’re looking into changing the implementation so that each recognizer would be treated individually. Makin the name mandatory would not be backward compatible.

0reactions

omri374commented, Aug 6, 2022

Hi @hummingbird1989, yes this is likely the same bug. Potentially just changing the name of the recognizer would solve this.