[BUG] Fatal parse error analysing document using invoice model - "not recognized as a valid DateTime"
See original GitHub issueLibrary name and version
Azure.AI.FormRecognizer 4.0.0-beta.3
Describe the bug
Send a document to the analyser for ‘invoice’ processing. Service responds without error but SDK throws exception due to parsing issue.
Expected behavior
SDK should return the analysed document information, with best efforts at recognising data types This should be a TRY parse, not fail everything because of one dubious value. Analysis model should be flexible enough to return values just as text, if they are ‘date-ish’ or ‘number-ish’
Actual behavior
SDK throws System.FormatException:
The string 'yyyy-08-21' was not recognized as a valid DateTime. There is an unknown word starting at index '0'.
at System.DateTimeParse.Parse(ReadOnlySpan`1 s, DateTimeFormatInfo dtfi, DateTimeStyles styles, TimeSpan& offset)\r\n
at System.DateTimeOffset.Parse(String input, IFormatProvider formatProvider, DateTimeStyles styles)\r\n
at Azure.Core.TypeFormatters.ParseDateTimeOffset(String value, String format)\r\n
at Azure.AI.FormRecognizer.DocumentAnalysis.DocumentField.DeserializeDocumentField(JsonElement element)\r\n
at Azure.AI.FormRecognizer.DocumentAnalysis.DocumentField.DeserializeDocumentField(JsonElement element)\r\n
at Azure.AI.FormRecognizer.DocumentAnalysis.DocumentField.DeserializeDocumentField(JsonElement element)\r\n
at Azure.AI.FormRecognizer.DocumentAnalysis.AnalyzedDocument.DeserializeAnalyzedDocument(JsonElement element)\r\n
at Azure.AI.FormRecognizer.DocumentAnalysis.AnalyzeResult.DeserializeAnalyzeResult(JsonElement element)\r\n
at Azure.AI.FormRecognizer.DocumentAnalysis.AnalyzeResultOperation.DeserializeAnalyzeResultOperation(JsonElement element)\r\n
at Azure.AI.FormRecognizer.DocumentAnalysis.DocumentAnalysisRestClient.<GetAnalyzeDocumentResultAsync>d__11.MoveNext()\r\n
Reproduction Steps
Submitting financial document so won’t provide the source data, but the stack trace should be sufficient to trace the root cause… this is a vanilla call to the client
try
{
var apiResponse = await _documentAnalysisClient.StartAnalyzeDocumentFromUriAsync("prebuilt-invoice", uri);
await apiResponse.WaitForCompletionAsync();
return apiResponse.Value;
}
catch (Exception e)
{
log.LogError($"{e.GetType()}\n{e.Message}\n{e.StackTrace}");
}
Environment
.NET SDK (reflecting any global.json): Version: 6.0.200 Commit: 4c30de7899
Runtime Environment: OS Name: Windows OS Version: 10.0.22000 OS Platform: Windows RID: win10-x64 Base Path: C:\Program Files\dotnet\sdk\6.0.200\
Host (useful for support): Version: 6.0.2 Commit: 839cdfb0ec
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (6 by maintainers)
Top GitHub Comments
Thanks @kinelski , I am now able to retrieve the document analysis value with the SDK, without error
@kweebtronic Apologies for the delayed response. I have discussed this matter with the service team and confirmed it’s a bug. They already have a fix but deployment is expected to take around two weeks, so I’ll get back to you when that happens.
Once the fix is in place, you won’t be able to access the field date value with
DocumentField.AsDate
as our samples suggest. This only affects “incomplete” dates that can’t be parsed by the SDK such as “yyyy-08-21”. In these cases, you’ll need to access the string representation of the date inDocumentField.Content
and parse it in your code if necessary.If you’re blocked by this bug and need a fix asap, you could use an HTTP policy to intercept the service response and manually remove the dates causing the bug:
You need to set it in the client options like this: