question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DocumentAnalysisClient.AnalyzeDocumentAsync generates different results than FormRecognizerStudio

See original GitHub issue

Library name and version

Azure.Ai.FormRecognizer 4.0.0-beta.5

Query/Question

DocumentAnalysisClient.AnalyzeDocumentAsync generates different .json than FormRecognizerStudio.

Im currently trying to update our project from the old 2.1 FOTT interface to FormRecognizerStudio. As a result, I repeated the steps of training our compound models. In doing so, I wondered why the generated json in the Testing Section of FormRecognizerStudio is different from what I generate when you use DocumentAnalysisClient.AnalyzeDocumentAsync.

In addition to that the proposed example code, on FormRecognizerStudio, is outdated for usage with the new 4.0 Model Ids

The biggest difference is that the field tables I defined are null, while they are perfectly recognized in the general “Tables” section of the json file

FormRecognizerStudio :

"Arbeitslohn": { "type": "array", "valueArray": [ { "type": "object", "valueObject": { "RC": { "type": "string", "valueString": "A", "content": "A", "boundingRegions": [ { "pageNumber": 2, "polygon": [ 0.794, 7.5526, 0.8589, 7.5526, 0.8589, 7.6246, 0.794, 7.6246 ] } ], "confidence": 0, "spans": [ { "offset": 6681, "length": 1 } ] },

DocumentAnalysisClient.AnalyzeDocumentAsync:

        "Arbeitslohn": {
            "ValueType": 6,
            "Content": null,
            "BoundingRegions": [],
            "Spans": [],
            "Confidence": 0.0
        },

Aswell as the Headers of the Json

FormRecognizerStudio:

{
	"status": "succeeded",
	"createdDateTime": "2022-08-30T11:08:42Z",
	"lastUpdatedDateTime": "2022-08-30T11:09:36Z",
	"analyzeResult": {
		"apiVersion": "2022-08-31",
		"modelId": "dataudatex02",
		"stringIndexType": "utf16CodeUnit",
		"content": "redacted",
		"pages": [
			{
...

DocumentAnalysisClient.AnalyzeDocumentAsync:


    "ModelId": "dataudatex02",
    "Content": "Redacted",
    "Pages": [{
        "Unit": 1,
        "Kind": {},
        "PageNumber": 1,
        "Angle": 0.0,
        "Width": 8.2639,
        "Height": 11.6944,
        "Spans": [{
            "Offset": 0,
            "Length": 4538
        }],
        "Words": [{
            "BoundingPolygon": {
                "Length": 4
            },´

My Code:

`
Stream stream = new MemoryStream(input.Contents);

        string apiKey = input.Apikey;
        string endpoint = input.Endpoint;
        ModelId = input.CustomerCompoundID;

            DocumentAnalysisClient recognizerClient = new DocumentAnalysisClient(new Uri(endpoint), new AzureKeyCredential(apiKey));
        try
        {
            AnalyzeDocumentOperation operation = await recognizerClient.AnalyzeDocumentAsync(WaitUntil.Completed, ModelId, stream);

            AnalyzeResult forms = operation.Value;

          foreach (AnalyzedDocument document in forms.Documents)
            {
                Console.WriteLine($"Document of type: {document.DocType}");

                foreach (KeyValuePair<string, DocumentField> fieldKvp in document.Fields)
                {
                    string fieldName = fieldKvp.Key;
                    DocumentField field = fieldKvp.Value;

                    Console.WriteLine($"Field '{fieldName}': ");

                    Console.WriteLine($"  Content: '{field.Content}'");
                    Console.WriteLine($"  Confidence: '{field.Confidence}'");
                }
            }`

        }
        catch (Exception e)
        {

            throw e;
        }

Any pointers on why my results differ while using the same Model ID in two different Enviroments is greatly appreciated

Environment

Windows Desktop 10 .NET 6

Visual Studio Version 17.3.0 Preview 2.0

FormRecognizerStudio API Version 2022-08-31

Azure.Ai.FormRecognizer 4.0.0-beta.5

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
kinelskicommented, Aug 30, 2022

Hello @sH0221,

As @peterv85 pointed out, the difference you see is caused by a difference in service versions. Form Recognizer Studio uses the latest version 2022-08-31 of the service, while the SDK still uses 2022-06-30-preview. This issue can be fixed with the next update of the SDK that will come in early September.

If this is blocking you and you can’t wait for the new SDK version, you can also consume one of our alpha packages in the meantime. They already target the latest service version.

In order to consume the alpha package you need to use our NuGet Package Dev Feed (detailed instructions can be found in our CONTRIBUTING guide). The version you need to target is 4.0.0-alpha.20220829.3. Also, please check our CHANGELOG for an exhaustive list of changes included since our 4.0.0-beta.5 release.

I’ll keep this issue open and let both of you know when the next package is released.

1reaction
peterv85commented, Aug 30, 2022

I’ve seen the same, 2022-06-30-preview, gives different results compared to 2022-08-31. Reverted back to the previous version in form studio (and rebuild the model) and still not working 100%.

This will be fixed if we can release the SDK to support v 2022-08-31. Am considering just using the API directly, not sure how long it will take to release the SDK update?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Inconsistent analyze results for form recognizer studio and ...
I've trained a custom neural model for some customer specific invoices using the 28-02-23-preview api version.
Read more >
Untitled
Below is an example of how you can create a Form Recognizer resource using the ... AnalyzeDocumentAsync generates different .json than FormRecognizerStudio.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found