question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problem with encoding dates in serialization.py

See original GitHub issue

I think the accepted standard of OrientDB is to write and read dates in POSIX/Unix time which means they should be encoded and decoded as if they occurred at UTC midnight. If you use the OrientDB console:

CREATE CLASS Event EXTENDS V  
CREATE VERTEX Event SET name='Example', date=Date('1970-01-01 00:00:00')
SELECT @RID, name, date.asLong() FROM Event

You get as expected 0L for the long value stored.

In serializations.py dates are encoded and decoded using local time. This means if the client using pyorient reading a date field is in a more western timezone than who wrote the value using pyorient, they get a different date. (i.e. If you encode 1970-01-01 with current serialization.py in New York it actually writes, 18000000 to the DB. If you then decode that with pyorient in California as a date, you get 1969-12-31). Note in the process pyorient discards the time portion of the timestamp, meaning the dates are truly different.

I can see why this may evade tests, as if the test process that is writing data and the test process reading the data are in the same timezone they will return the same value. (even if still the value stored in the DB is technically wrong). Any tests on dates should be updated to check what the long value stored in the DB is, i.e. Writing 1970-01-01 should store 0L.

Both the encoding and the decoding need to be fixed:

Decoding (lines 322:324 of serializations.py) should change from:

        if c == 'a':
            collected = date.fromtimestamp(float(collected) / 1000)
            content = content[1:]

to

        if c == 'a':
            collected = datetime.utcfromtimestamp(float(collected) / 1000).date()
            content = content[1:]

and encoding (lines 131:132 of serializations.py) should change from:

        elif isinstance(value, date):
            ret = str(int(time.mktime(value.timetuple())) * 1000) + 'a'

to

        elif isinstance(value, date):
            ret = str(int(calendar.timegm(value.timetuple())) * 1000) + 'a'

For the later change, you’ll also need to import calendar. calendar.timegm() assumes the date is UTC, time.mktime() assumes its local time. They return 0 (correct) and 18000000 (incorrect for Orient) respectively.

Issue Analytics

  • State:open
  • Created 8 years ago
  • Comments:13 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
tvashtarcommented, Apr 11, 2016

“i think that the responsibility of writing and decoding the dates in the right Timezone and format, must be delegated to the applications .” I 100% agree, but pyorient doesn’t let you do that. It forces both the read and write client to write and decode using the local timezone.(by using time.mktime and date.fromtimestamp ) And currently that means it is impossible to use pyorient to store dates if you might have clients reading that are west of you. All dates will be wrong. Or do you have a suggestion on how to use pyorient to do this? The only way I can think of is very hacky, give every date a separate timezone property for where it was written that also has to be read and if you are reading it further west you increment the date returned by pyorient by one.

0reactions
nikulukanicommented, Jul 9, 2017

Just went through some code on the OrientDB side to see how they handle dates and realized that the serialization/deserialization behavior for Binary and CSV is different on the server side. The server behavior for date objects when using binary serialization essentially resolves this issue (lines 426, 717 and 998-1012 from https://github.com/orientechnologies/orientdb/blob/2.2.x/core/src/main/java/com/orientechnologies/orient/core/serialization/serializer/record/binary/ORecordSerializerBinaryV0.java#L426)

For CSV serialization/deserialization, the OrientDB server does not perform the conversion from/to <date> 00:00:00 local time to/from <date> 00:00:00 UTC. (lines 638-645 and 474 from https://github.com/orientechnologies/orientdb/blob/7850712aafb3cb7c61a5c2865710019df0a7e8c9/core/src/main/java/com/orientechnologies/orient/core/serialization/serializer/record/string/ORecordSerializerStringAbstract.java#L638).

The latter is what’s causing the issue reported by OP and reflected in the results I posted. Moreover, the server sets the time to 00:00:00 during deserialization after converting the UNIX time sent over to the database time zone. This behavior is honestly baffling to me and very problematic. This essentially can result in +/- 1 or 2 days difference in encoded/decoded values when the server is not in the same time zone as the client. If clients (applications) do know the database timezone, the best way to avoid problems is to send over the date as UNIX time corresponding to <date> 00:00:00 <database time zone> and decode it using the same scheme when using CSV serialization.

Given the above, to mitigate this issue, I agree with the proposal to have the option of specifying a non-default encoding and decoding time zone for Dates when using CSV serialization. For DateTime, I cannot foresee a situation where it would be useful, but having the option can’t hurt!

Finally, on a related but unrelated note, it seems that OrientDB stores dates as UNIX time corresponding to <date> 00:00:00 <DB timezone>. I am not sure if they process all dates in the database when the database timezone is changed with ALTER database <timezone>, but I would venture that they do not. This means that the date values stored in the DB (or at least their interpretation) will change if the database timezone is changed after a record was created. As such, I would suggest caution with using ALTER database <timezone> command if you have records with date fields already in OrientDB.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to overcome "datetime.datetime not JSON serializable"?
My answer solves the question using python. The OP did not say if the solution should include or exclude certain libraries. It is...
Read more >
How to customize character encoding with System.Text.Json
Learn how to customize character encoding while serializing to and deserializing from JSON in .NET.
Read more >
MessagePack: It's like JSON. but fast and small.
MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON. But it's faster and smaller.
Read more >
Make a Python Class JSON Serializable - PYnative
You are here because when you try to encode a custom Python object into a JSON format, you received a TypeError: Object of...
Read more >
Date and time fields in serializers - Django REST Framework
Setting this value to None indicates that Python datetime objects should be returned by to_representation. In this case the datetime encoding ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found