bug: error processing columns of type timestamp with null values (ibis 1.4.0 on postgres)
See original GitHub issueCurrently I’m stumbling over an issue when trying to execute value_counts()
or distinct()
on a column of type timestamp
and with null
value on a postgres
database as backend.
The problem is found in the following function in the pandas
client backend
def convert_timezone(obj, timezone):
"""Convert `obj` to the timezone `timezone`.
Parameters
----------
obj : datetime.date or datetime.datetime
Returns
-------
type(obj)
"""
if timezone is None:
return obj.replace(tzinfo=None)
return pytz.timezone(timezone).localize(obj)
where it happens that the null
value from postres
is stored/interpreted by ibis
as None
, and clearly this function fails since None
has no replace
method. The fact that it’s failing for null
values was not something I had expected from ibis
, and actually from what I can tell it doesn’t fail for columns with other types.
I’m using ibis
version 1.4.0 via conda forge on Windows running on Python 3.8
My current workarround, although less than ideal, is to cast the timestamp
into string
and then use distinct()
on the new column. I guess a relatively easy solution would be to put a guard checking if the obj
is indeed a datetime
or equivalent. Although such a check might be against performance priorities in the project. Probably a cleaner solution would be that instead of a null
being interpreted as None
to be interpreted as pandas.NaT
, which does implement the replace()
method.
Or is there some other way to deal with this situation?
I see some somewhat related issues: enhancement proposal and error report. The error, although in origin seemingly similar to mine, the traceback is very differently. But also the version is 0.11.2.
Thanks in advance!
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (4 by maintainers)
Top GitHub Comments
I will put up a PR to add a test for this and close it out.
Without pandas having the ability to represent a larger range of dates the “best” solution is to cast the column to a string.
In the medium term we’d like to move away from using pandas as the core in-memory representation.