Missing identification results
See original GitHub issueDescribe the bug I have a custom entity A which with the regex ‘AAA’, and another custom entity B with the regex ‘BBB’ … custom entity D with the regex ‘DDD’. so my code like:
analyzer.registry.add_recognizer(PatternRecognizer(supported_entity='A', patterns=[Pattern('A', 'AAA', score=0.4)]))
analyzer.registry.add_recognizer(PatternRecognizer(supported_entity='B', patterns=[Pattern('B', 'BBB', score=0.4)]))
analyzer.registry.add_recognizer(PatternRecognizer(supported_entity='C', patterns=[Pattern('C', 'CCC', score=0.4)]))
analyzer.registry.add_recognizer(PatternRecognizer(supported_entity='D', patterns=[Pattern('D', 'DDD', score=0.4)]))
analyzer.analyze(text, 'en', score_threshold=0.5)
my problem is : I’m sure the text contains something that can be detected, even I only add one PatternRecognizer it’s work. But when I add multiple instances of PatternRecognizer, I can’t find the contents of the added PatternRecognizer as in the result.
Suppose I replace PatternRecognizer with CustomRecognizer, even if the analyzer.registry contains multiple instances of CustomRecognizer.
CustomRecongizer like:
class RegexRecognizer(EntityRecognizer):
expected_confidence_level = 0.4
def __init__(self, regex, entity, **kwargs):
kwargs['supported_entities'] = [entity]
kwargs['supported_language'] = 'zh'
super().__init__(**kwargs)
self.regex = regex
self.entity = entity
def load(self) -> None:
pass
def analyze(self, text: str, entities, nlp_artifacts):
try:
regex = re.compile(self.regex)
except Exception:
return []
else:
return [
RecognizerResult(entity_type=self.entity,
start=re_result.regs[0][0],
end=re_result.regs[0][1],
score=self.expected_confidence_level,
analysis_explanation=AnalysisExplanation(self.entity, self.expected_confidence_level))
for re_result in regex.finditer(text)
]
My NLP model is stanza, and every PatternRecognizer with different context. with the same text, if only recognizer:
{'港澳通行证': [{'content': 'CA0000001', 'end': 1339, 'score': 0.75, 'start': 1330},
{'content': 'C00007565', 'end': 1349, 'score': 0.75, 'start': 1340}]}
there are many recognizer :
'港澳通行证': [{'content': 'CA0000001', 'end': 1339, 'score': 0.4, 'start': 1330},
{'content': 'C00007565', 'end': 1349, 'score': 0.4, 'start': 1340}],
Issue Analytics
- State:
- Created a year ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
The search process: Integrating the investigation and ...
The effective search for the missing and identification of persons, alive or dead, are core components in the prevention and in resolving the...
Read more >Missing people, dna analysis and identification of human ...
This new and expanded edition of Missing People, DNA Analysis and Identification of Human Remains: A Guide to Best Practice in Armed Conflicts...
Read more >Dense DNA Data for Enhanced Missing Persons Identification
The proposal herein will develop strategies to extract reliable and accurate data from dense SNP typing results from less than optimum input ...
Read more >Missing Persons - Center for Human Identification
The Missing Persons program began in 2002 as a result of the passage of Texas Senate Bill 1304. This bill allowed the establishment...
Read more >Genetic Identification of Missing Persons: DNA Analysis of ...
Large databases house DNA profiles from convicted felons (and in some jurisdictions arrestees), from forensic evidence, human remains, and ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Yes, that’s exactly it. The bug arises from the fact that all recognizers have the same name. Passing a name would definitely fix it, but we’re looking into changing the implementation so that each recognizer would be treated individually. Makin the name mandatory would not be backward compatible.
Hi @hummingbird1989, yes this is likely the same bug. Potentially just changing the name of the recognizer would solve this.