Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

bug: error processing columns of type timestamp with null values (ibis 1.4.0 on postgres)

See original GitHub issue

Currently I’m stumbling over an issue when trying to execute value_counts() or distinct() on a column of type timestamp and with null value on a postgres database as backend.

The problem is found in the following function in the pandas client backend

def convert_timezone(obj, timezone):
    """Convert `obj` to the timezone `timezone`.

    Parameters
    ----------
    obj : datetime.date or datetime.datetime

    Returns
    -------
    type(obj)
    """
    if timezone is None:
        return obj.replace(tzinfo=None)
    return pytz.timezone(timezone).localize(obj)

where it happens that the null value from postres is stored/interpreted by ibis as None, and clearly this function fails since None has no replace method. The fact that it’s failing for null values was not something I had expected from ibis, and actually from what I can tell it doesn’t fail for columns with other types.

I’m using ibis version 1.4.0 via conda forge on Windows running on Python 3.8

My current workarround, although less than ideal, is to cast the timestamp into string and then use distinct() on the new column. I guess a relatively easy solution would be to put a guard checking if the obj is indeed a datetime or equivalent. Although such a check might be against performance priorities in the project. Probably a cleaner solution would be that instead of a null being interpreted as None to be interpreted as pandas.NaT, which does implement the replace() method.

Or is there some other way to deal with this situation?

I see some somewhat related issues: enhancement proposal and error report. The error, although in origin seemingly similar to mine, the traceback is very differently. But also the version is 0.11.2.

Thanks in advance!

Issue Analytics

State:
Created 3 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

cpcloudcommented, Jun 9, 2022

I will put up a PR to add a test for this and close it out.

1reaction

cpcloudcommented, Jan 11, 2022

Without pandas having the ability to represent a larger range of dates the “best” solution is to cast the column to a string.

In the medium term we’d like to move away from using pandas as the core in-memory representation.

Top Results From Across the Web

null timestamp in postgres causes error · Issue #1159 - GitHub

It is a strange and hard to track down problem. Some timestamp columns that are all null work fine, and others throw the...

Error processing colums of time timestamp with null values (ibis 1.4 ...

Currently I'm stumbling over an issue when trying to execute value_counts() or distinct() on a column of type timestamp and with null value...

Release Notes - Ibis Project

This release brings new backends, including support for executing against files, MySQL, Pandas user defined scalar and aggregations along with a ...

Inserting NULL value to TIMESTAMP fields in postgreSQL

When you specify NULL in your list of values to insert, Postgres will try to insert a NULL into that column of the...

Apache Impala (incubating) Guide

Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS, HBase, or the. Amazon Simple Storage Service (S3)....