question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support the UTC formatter in the JSON Reader

See original GitHub issue

Is your feature request related to a problem? Please describe. Support the UTC formatter in the JSON Reader

Describe the solution you’d like optional 1 :

  • use a few of the most popular formatter and the try-catch
  • minimal changes

example :

LocalDateTime ldt;
try {
  OffsetDateTime originalDateTime = OffsetDateTime.parse(parser.getValueAsString(), DateUtility.isoFormatTimeStamp);
  ldt = originalDateTime.toLocalDateTime();
} catch (DateTimeParseException e) {
  ldt = LocalDateTime.parse(parser.getValueAsString(), DateUtility.utcFormatDateTime); // "yyyy-MM-dd'T'HH:mm:ss'Z'"
} catch (DateTimeParseException e) {
  ldt = LocalDateTime.parse(parser.getValueAsString(), DateUtility.utcFormatTimeStamp); // "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
}
OffsetDateTime utcDateTime = OffsetDateTime.of(ldt, ZoneOffset.UTC);

optional 2 :

  • allow the user to define the UTC formatter use the ALTER SESSION SET syntax.
  • not sure the framework support to extend this feature.

example (Dummy) :

ALTER SESSION SET `store.json.date_formatter` = "yyyy-MM-dd'T'HH:mm:ss'Z'"
LocalDateTime ldt;
if (hasUTCKeyword(parser.getValueAsString()) { // value.indexOf('T') > 0 & value.indexOf('Z') > value.indexOf('T')
  ldt = LocalDateTime.parse(parser.getValueAsString(), session_date_formatter);
} else {
  ldt = OffsetDateTime.parse(parser.getValueAsString(), session_date_formatter).toLocalDateTime();
}

Describe alternatives you’ve considered NONE

Additional context

When the date value as the ISODate (without the timezone, or called 0 timezone) store in mongo and set the store.mongo.bson.record.reader to false:

{
  "_id" : ObjectId("5da7760149b3f000195cabb"),
  "date" : ISODate("2019-09-24T20:06:56Z")
}

Drill got the error stack error :

Caused by: java.lang.Exception: Text '2019-09-30T20:47:43Z' could not be parsed at index 19

Because the OffsetDateTime parse the date string use the fixed formatter yyyy-MM-dd'T'HH:mm:ss.SSSXX. Then, the OffsetDateTime is not allowed to accept the UTC formatter ***T***Z (or called 0 timezone) : example 1:

yyyy-MM-dd'T'HH:mm:ss.SSS'Z'

example 2:

yyyy-MM-dd'T'HH:mm:ss'Z'

Linked resource : https://github.com/apache/drill/blob/39b565f112122734c080324fdcbef518ced16507/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/VectorOutput.java#L353-L357

https://github.com/apache/drill/blob/39b565f112122734c080324fdcbef518ced16507/exec/vector/src/main/java/org/apache/drill/exec/expr/fn/impl/DateUtility.java#L635-L635

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
paul-rogerscommented, Aug 22, 2021

@luocooong, I may be confused, but as I read the Mongo spec, it does want the UTC “Zulu” format: 2019-09-30T20:47:43Z. I verified that this format is tested and does work correctly in the new JSON loader.

When you say “without a timezone”, I think you are describing a date/time of the form 2019-09-30T20:47:43. Ths would be a local time. Drill uses local time internally, but Mongo (wisely) seems to use UTC time. Thus, if you are reading Mongo data, you should not see a local time (if I understand Mongo correctly.) By the way, 2019-09-30T20:47:43Z does have a time zone: it is zero offset, also called GMT.

Your testing shows that the “Zulu” format is broken in the old JSON parser. Let’s try to figure out why it is broken.

I looked at the code in your call stack. It is pretty convoluted – another reason for the new JSON parser. The JSON parser tries to handle all maps the same. Mongo extended types are, syntactically, a JSON map. The MapVectorOutput.run() method checks for Mongo keywords. For the date/time keyword, the code then calls VectorOutput$MapVectorOutput.writeTimestamp. I suspect this is where things went wrong. The VectorOutput$MapVectorOutput.writeTimestamp method is not unique to Mongo JSON, it is a generic vector method. As you note, at present it uses the isoFormatTime constant in DateUtility:

  public static final DateTimeFormatter isoFormatTime     = buildFormatter("HH:mm:ss.SSSXX");

For Mongo, it should use the UTC_FORMATTER constant:

  public static final DateTimeFormatter UTC_FORMATTER = buildFormatter("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'");

Checking the file history, it looks like the following commit broke things: “DRILL-6242 Use java.time.Local{Date|Time|DateTime} for Drill Date, Time, Timestamp types.” (Use “Blame” on the VectorOutput.java file.) My guess is that the author wanted to make sure Drill used only local times, and did not realize that he was breaking Mongo which requires ISO “Zulu” timestamps.

A quick check of the code suggests that only the old JsonReader uses this code path. So, you can try reverting the code to use the UTC_FORMATTER and rerun unit tests. Also check against your Mongo test case. If both of these work, then this is the simplest fix.

Now, it could be that something in the tests uses a Mongo-format date/time, but with a Drill-like local time. If so, then we can look at the problem an think about how to solve it. Let’s see if running the tests tells us if we even have this problem.

0reactions
luocooongcommented, Aug 22, 2021

@paul-rogers Thanks for the information. But the old JSON loader use the fixed date formatter : yyyy-MM-dd'T'HH:mm:ss.SSSXX, It required a timezone (or offset). And then, the OffsetDateTime class does not accept a date string without timezone.

https://github.com/apache/drill/blob/39b565f112122734c080324fdcbef518ced16507/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/VectorOutput.java#L353-L354

So, the old JSON loader cannot parse the 2019-09-30T20:47:43Z(with this fixed formatter). Interestingly, the 2019-09-30T20:47:43Z is equals to the 2019-09-30T20:47:43+0000 (Z is the 0 timezone), but the OffsetDateTime don’t even know it. In this case, Do we need to update the new JSON loader?

Read more comments on GitHub >

github_iconTop Results From Across the Web

JSON Stringify changes time of date because of UTC
toJSON() prints the UTC-Date into a String formatted (So adds the offset with it when converts it to JSON format). date = new...
Read more >
DateTime and DateTimeOffset support in System.Text.Json
An overview of how DateTime and DateTimeOffset types are supported in the System.Text.Json library.
Read more >
[jira] [Commented] (DRILL-7989) Use the UTC formatter in the ...
This is the first value in `input2.json`, so it seems we're good. -- This is an automated message from the Apache Git Service....
Read more >
Data.format.parse does not support 'utc:"%m%d%Y" · Issue #818
I have tried a couple of examples and they did work but when I tried to add an example in vl, the validator...
Read more >
How to handle incoming utc date in json and map to oracle ...
I have a Map shape with Json to Oracle dynamic insert. My JSON contains date in UTC format in yyyy-MM-dd'T'HH:mm:ss.SSS'Z.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found