question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Mismatched scores returned from AnalyzerEngine

See original GitHub issue

Describe the bug For some inputs, the results returned by the analyzer have unexpected scores of 1.0 under the attribute , despite no context words being present in the input at all. These scores are different than the scores listed analysis_explanation when return_decision_process = True (these are the expected scores).

To Reproduce

An example:

from presidio_analyzer import AnalyzerEngine
analyzer = AnalyzerEngine()
text = 'You can call my phone 907-882-3534'
results = analyzer.analyze(text , language='en', return_decision_process = True)
print(results)
> [type: UK_NHS, start: 22, end: 34, score: 1.0, type: PHONE_NUMBER, start: 22, end: 34, score: 0.75]
# UK_NHS has a score of 1.0 despite no context words in input, and default pattern score = 0.5

print([i.score for i in results])
> [1.0***, 0.75]

print([i.analysis_explanation.score for i in results])
> [0.5**, 0.75]
# this is the expected score for UK_NHS entity

Expected behavior Scores should match for both attributes.

Additional context

presidio-analyzer==2.2.27
presidio-anonymizer==2.2.27

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
liaehcommented, Apr 4, 2022

@omri374 sure, I’ll create a PR to update the docs.

I agree it would be nice to have the confidence more configurable. In this scenario, since the result validation boosts the score right away to the MAX_SCORE, then there’s no more room for context to be taken into consideration.

1reaction
omri374commented, Apr 4, 2022

Since a checksum is a pretty strong guarantee of a number being a specific entity, we assign the confidence to 1.0. Having said that it could be that a phone number accidentally passes a checksum for another entity. One thing we can do is to have the confidence value more configurable.

Read more comments on GitHub >

github_iconTop Results From Across the Web

presidio/test_analyzer_engine.py at main · microsoft ... - GitHub
return AppTracerMock(enable_decision_process=True) ... return analyzer_engine ... This analyzer engine is different from the global one, as this one.
Read more >
Presidio Analyzer Python API - Microsoft Open Source
Enhance confidence score using context of the entity. Override this method in derived class in case a custom logic is needed, otherwise return...
Read more >
presidio-analyzer - PyPI
The Presidio analyzer is a Python based service for detecting PII entities in text. During analysis, it runs a set of different PII...
Read more >
presidio_analyzer.AnalyzerEngine Example - Program Talk
Project Creator : microsoft. def analyzer_engine(): """Return AnalyzerEngine.""" return AnalyzerEngine() @st.cache(allow_output_mutation=True).
Read more >
3.1. Debug Checks — Clang 11 documentation
These checkers are used to dump the results of various infrastructural ... These checkers print information about the path taken by the analyzer...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found