[QUESTION] Why does TextAnalytics SDK not return same results as Language Studio?
See original GitHub issueLibrary name and version
Azure.AI.TextAnalytics 5.1.1
Query/Question
Hi there,
I am currently writing an Azure function in C# which uses the TextAnalytics SDK to detect IBAN values. To detect these values I am using the following code:
var client = new TextAnalyticsClient(new Uri(endpoint), new AzureKeyCredential(apiKey));
var options = new RecognizePiiEntitiesOptions();
options.CategoriesFilter.Add(PiiEntityCategory.InternationalBankingAccountNumber);
var piiResponse = await client.RecognizePiiEntitiesAsync(myQueueItem, "en");
PiiEntityCollection piiEntities = piiResponse.Value;
myQueueItem is a string that contains multiple sample IBANs. This is the following string:
NL24 ABNA 4047 6339 76 NL61 INGB 0003 3505 63 NL18RABO0123459876 NL98INGB0003856625 NL98ABNA0416961347 NL98UGBI0771565860 NL98TRIO0254712320 NL98SNSB0908532792 NL97DEUT0265134951 NL97BNPA0227673409 NL97BNGH0285061917 NL97BOFA0266546412
The problem is that not all the IBANs are detected, however when I am testing this same string with Language Studio all the IBANs do get detected.
C# SDK Results:
Language Studio Results:
Environment
.NET SDK (reflecting any global.json): Version: 6.0.202 Commit: f8a55617d2
Runtime Environment: OS Name: Windows OS Version: 10.0.19042 OS Platform: Windows RID: win10-x64 Base Path: C:\Program Files\dotnet\sdk\6.0.202\
Host (useful for support): Version: 6.0.4 Commit: be98e88c76
.NET SDKs installed: 6.0.202 [C:\Program Files\dotnet\sdk]
.NET runtimes installed: Microsoft.AspNetCore.App 6.0.4 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.NETCore.App 6.0.4 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.WindowsDesktop.App 6.0.4 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
IDE Version: 17.1.6
Issue Analytics
- State:
- Created a year ago
- Comments:9 (5 by maintainers)
Top GitHub Comments
Thanks for the help! The problem was indeed the way the content of myQueueItem was handled. I passed the entire myQueueItem instead of the property that I’d wanted to be analyzed.
Great! thank you! so both Language Studio and the SDK are using the same service version. I tried to repro your scenario using the SDK and I get the 12 PII entities that the Language Studio is showing.
I’m using the code present in the PII sample:
Output:
The only difference from our code is how the text is passed. Looking at the output, it seems like the content in
myQueueItem
is not properly handling End of Line/File as those are the 5 entities that are missing