DocumentAnalysisClient.AnalyzeDocumentAsync generates different results than FormRecognizerStudio
See original GitHub issueLibrary name and version
Azure.Ai.FormRecognizer 4.0.0-beta.5
Query/Question
DocumentAnalysisClient.AnalyzeDocumentAsync generates different .json than FormRecognizerStudio.
Im currently trying to update our project from the old 2.1 FOTT interface to FormRecognizerStudio. As a result, I repeated the steps of training our compound models. In doing so, I wondered why the generated json in the Testing Section of FormRecognizerStudio is different from what I generate when you use DocumentAnalysisClient.AnalyzeDocumentAsync.
In addition to that the proposed example code, on FormRecognizerStudio, is outdated for usage with the new 4.0 Model Ids
The biggest difference is that the field tables I defined are null, while they are perfectly recognized in the general “Tables” section of the json file
FormRecognizerStudio :
"Arbeitslohn": { "type": "array", "valueArray": [ { "type": "object", "valueObject": { "RC": { "type": "string", "valueString": "A", "content": "A", "boundingRegions": [ { "pageNumber": 2, "polygon": [ 0.794, 7.5526, 0.8589, 7.5526, 0.8589, 7.6246, 0.794, 7.6246 ] } ], "confidence": 0, "spans": [ { "offset": 6681, "length": 1 } ] },
DocumentAnalysisClient.AnalyzeDocumentAsync:
"Arbeitslohn": {
"ValueType": 6,
"Content": null,
"BoundingRegions": [],
"Spans": [],
"Confidence": 0.0
},
Aswell as the Headers of the Json
FormRecognizerStudio:
{
"status": "succeeded",
"createdDateTime": "2022-08-30T11:08:42Z",
"lastUpdatedDateTime": "2022-08-30T11:09:36Z",
"analyzeResult": {
"apiVersion": "2022-08-31",
"modelId": "dataudatex02",
"stringIndexType": "utf16CodeUnit",
"content": "redacted",
"pages": [
{
...
DocumentAnalysisClient.AnalyzeDocumentAsync:
"ModelId": "dataudatex02",
"Content": "Redacted",
"Pages": [{
"Unit": 1,
"Kind": {},
"PageNumber": 1,
"Angle": 0.0,
"Width": 8.2639,
"Height": 11.6944,
"Spans": [{
"Offset": 0,
"Length": 4538
}],
"Words": [{
"BoundingPolygon": {
"Length": 4
},´
My Code:
`
Stream stream = new MemoryStream(input.Contents);
string apiKey = input.Apikey;
string endpoint = input.Endpoint;
ModelId = input.CustomerCompoundID;
DocumentAnalysisClient recognizerClient = new DocumentAnalysisClient(new Uri(endpoint), new AzureKeyCredential(apiKey));
try
{
AnalyzeDocumentOperation operation = await recognizerClient.AnalyzeDocumentAsync(WaitUntil.Completed, ModelId, stream);
AnalyzeResult forms = operation.Value;
foreach (AnalyzedDocument document in forms.Documents)
{
Console.WriteLine($"Document of type: {document.DocType}");
foreach (KeyValuePair<string, DocumentField> fieldKvp in document.Fields)
{
string fieldName = fieldKvp.Key;
DocumentField field = fieldKvp.Value;
Console.WriteLine($"Field '{fieldName}': ");
Console.WriteLine($" Content: '{field.Content}'");
Console.WriteLine($" Confidence: '{field.Confidence}'");
}
}`
}
catch (Exception e)
{
throw e;
}
Any pointers on why my results differ while using the same Model ID in two different Enviroments is greatly appreciated
Environment
Windows Desktop 10 .NET 6
Visual Studio Version 17.3.0 Preview 2.0
FormRecognizerStudio API Version 2022-08-31
Azure.Ai.FormRecognizer 4.0.0-beta.5
Issue Analytics
- State:
- Created a year ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
Hello @sH0221,
As @peterv85 pointed out, the difference you see is caused by a difference in service versions. Form Recognizer Studio uses the latest version 2022-08-31 of the service, while the SDK still uses 2022-06-30-preview. This issue can be fixed with the next update of the SDK that will come in early September.
If this is blocking you and you can’t wait for the new SDK version, you can also consume one of our alpha packages in the meantime. They already target the latest service version.
In order to consume the alpha package you need to use our NuGet Package Dev Feed (detailed instructions can be found in our CONTRIBUTING guide). The version you need to target is 4.0.0-alpha.20220829.3. Also, please check our CHANGELOG for an exhaustive list of changes included since our 4.0.0-beta.5 release.
I’ll keep this issue open and let both of you know when the next package is released.
I’ve seen the same, 2022-06-30-preview, gives different results compared to 2022-08-31. Reverted back to the previous version in form studio (and rebuild the model) and still not working 100%.
This will be fixed if we can release the SDK to support v 2022-08-31. Am considering just using the API directly, not sure how long it will take to release the SDK update?