question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

feature request: analyser shall return the matching string and original text

See original GitHub issue

Describe the bug from https://github.com/Microsoft/presidio/blob/master/docs/tutorial_service.md Sample 4: Custom anonymization

*** Currently the result return is { “field”: { “name”: “US_DRIVER_LICENSE” }, “score”: 0.65, “location”: { “start”: 176, “end”: 184, “length”: 8 } }

*** it would be good if you can return so that it is easier to debug

"field": {
  "name": "US_DRIVER_LICENSE"
},
"score": 0.65,
"location": {
  "start": 176,
  "end": 184,
  "length": 8
}
"match_text": "AC333991"

}

**** it will be good if you can also return the original “text” { . . . “text”:“John Smith lives in New York. We met yesterday morning in Seattle. I called him before on (212) 555-1234 to verify the appointment. He also told me that his drivers license is AC333991” }

To Reproduce

$ echo -n '{"text":"John Smith lives in New York. We met yesterday morning in Seattle. I called him before on (212) 555-1234 to verify the appointment. He also told me that his drivers license is AC333991", "analyzeTemplate":{"allFields":true}  }' | http -F --verify=no https://192.168.1.44/api/v1/projects/1/analyze

Expected behavior N/A

Screenshots N/A

Additional context it is very common that API also return original text plus matching string.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
omri374commented, May 2, 2019

Hi @teo-chenglim, This is definitely doable. The reason why we didn’t include it in the first place, is because in some cases the next part in the pipeline is supposed to be ‘PII free’. The matched text is the PII entity. However we’ll give it some thought and update in case this gets implemented.

Is taking the original text from the request and extracting the matched substring using start and end indices an option?

0reactions
github-actions[bot]commented, Apr 11, 2020

This issue is stale because it has been open 30 days with no activity.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Analyze Text (Azure Cognitive Search REST API)
The Analyze API shows how an analyzer breaks text into tokens. It's intended for interactive testing so that you can see how a...
Read more >
Transforming data with Analyzers | ArangoDB Documentation
Analyzers parse input values and transform them into sets of sub-values, for example by breaking up text into words.
Read more >
Match query | Elasticsearch Guide [8.5] | Elastic
Returns documents that match a provided text, number, date or boolean value. The provided text is analyzed before matching. The match query is...
Read more >
Full-text query types and options - OpenSearch
Use the match query for full-text search of a specific document field. The match query analyzes the provided search string and returns documents...
Read more >
User Manual - rust-analyzer
It will ask your permission to download the matching language server ... Try rust-analyzer: Show RA Version in VS Code (using Command Palette...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found