question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Missing identification results

See original GitHub issue

Describe the bug I have a custom entity A which with the regex ‘AAA’, and another custom entity B with the regex ‘BBB’ … custom entity D with the regex ‘DDD’. so my code like:

analyzer.registry.add_recognizer(PatternRecognizer(supported_entity='A', patterns=[Pattern('A', 'AAA', score=0.4)]))
analyzer.registry.add_recognizer(PatternRecognizer(supported_entity='B', patterns=[Pattern('B', 'BBB', score=0.4)]))
analyzer.registry.add_recognizer(PatternRecognizer(supported_entity='C', patterns=[Pattern('C', 'CCC', score=0.4)]))
analyzer.registry.add_recognizer(PatternRecognizer(supported_entity='D', patterns=[Pattern('D', 'DDD', score=0.4)]))
analyzer.analyze(text, 'en', score_threshold=0.5)

my problem is : I’m sure the text contains something that can be detected, even I only add one PatternRecognizer it’s work. But when I add multiple instances of PatternRecognizer, I can’t find the contents of the added PatternRecognizer as in the result.

Suppose I replace PatternRecognizer with CustomRecognizer, even if the analyzer.registry contains multiple instances of CustomRecognizer.

CustomRecongizer like:

class RegexRecognizer(EntityRecognizer):
    expected_confidence_level = 0.4

    def __init__(self, regex, entity, **kwargs):
        kwargs['supported_entities'] = [entity]
        kwargs['supported_language'] = 'zh'
        super().__init__(**kwargs)
        self.regex = regex
        self.entity = entity

    def load(self) -> None:
        pass

    def analyze(self, text: str, entities, nlp_artifacts):
        try:
            regex = re.compile(self.regex)
        except Exception:
            return []
        else:
            return [
                RecognizerResult(entity_type=self.entity,
                                 start=re_result.regs[0][0],
                                 end=re_result.regs[0][1],
                                 score=self.expected_confidence_level,
                                 analysis_explanation=AnalysisExplanation(self.entity, self.expected_confidence_level))
                for re_result in regex.finditer(text)
            ]

My NLP model is stanza, and every PatternRecognizer with different context. with the same text, if only recognizer:

{'港澳通行证': [{'content': 'CA0000001', 'end': 1339, 'score': 0.75, 'start': 1330},
           {'content': 'C00007565', 'end': 1349, 'score': 0.75, 'start': 1340}]}

there are many recognizer :

'港澳通行证': [{'content': 'CA0000001', 'end': 1339, 'score': 0.4, 'start': 1330},
           {'content': 'C00007565', 'end': 1349, 'score': 0.4, 'start': 1340}],

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
omri374commented, Jul 5, 2022

Yes, that’s exactly it. The bug arises from the fact that all recognizers have the same name. Passing a name would definitely fix it, but we’re looking into changing the implementation so that each recognizer would be treated individually. Makin the name mandatory would not be backward compatible.

0reactions
omri374commented, Aug 6, 2022

Hi @hummingbird1989, yes this is likely the same bug. Potentially just changing the name of the recognizer would solve this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

The search process: Integrating the investigation and ...
The effective search for the missing and identification of persons, alive or dead, are core components in the prevention and in resolving the...
Read more >
Missing people, dna analysis and identification of human ...
This new and expanded edition of Missing People, DNA Analysis and Identification of Human Remains: A Guide to Best Practice in Armed Conflicts...
Read more >
Dense DNA Data for Enhanced Missing Persons Identification
The proposal herein will develop strategies to extract reliable and accurate data from dense SNP typing results from less than optimum input ...
Read more >
Missing Persons - Center for Human Identification
The Missing Persons program began in 2002 as a result of the passage of Texas Senate Bill 1304. This bill allowed the establishment...
Read more >
Genetic Identification of Missing Persons: DNA Analysis of ...
Large databases house DNA profiles from convicted felons (and in some jurisdictions arrestees), from forensic evidence, human remains, and ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found