feature request: analyser shall return the matching string and original text
See original GitHub issueDescribe the bug from https://github.com/Microsoft/presidio/blob/master/docs/tutorial_service.md Sample 4: Custom anonymization
*** Currently the result return is { “field”: { “name”: “US_DRIVER_LICENSE” }, “score”: 0.65, “location”: { “start”: 176, “end”: 184, “length”: 8 } }
*** it would be good if you can return so that it is easier to debug
"field": {
"name": "US_DRIVER_LICENSE"
},
"score": 0.65,
"location": {
"start": 176,
"end": 184,
"length": 8
}
"match_text": "AC333991"
}
**** it will be good if you can also return the original “text” { . . . “text”:“John Smith lives in New York. We met yesterday morning in Seattle. I called him before on (212) 555-1234 to verify the appointment. He also told me that his drivers license is AC333991” }
To Reproduce
$ echo -n '{"text":"John Smith lives in New York. We met yesterday morning in Seattle. I called him before on (212) 555-1234 to verify the appointment. He also told me that his drivers license is AC333991", "analyzeTemplate":{"allFields":true} }' | http -F --verify=no https://192.168.1.44/api/v1/projects/1/analyze
Expected behavior N/A
Screenshots N/A
Additional context it is very common that API also return original text plus matching string.
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (3 by maintainers)
Hi @teo-chenglim, This is definitely doable. The reason why we didn’t include it in the first place, is because in some cases the next part in the pipeline is supposed to be ‘PII free’. The matched text is the PII entity. However we’ll give it some thought and update in case this gets implemented.
Is taking the original text from the request and extracting the matched substring using start and end indices an option?
This issue is stale because it has been open 30 days with no activity.