Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: generated timestamps deviate from expected values

See original GitHub issue

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug DOES NOT exist on the main branch of pandas (1.4.3 at the time of creating this Issue).

Reproducible Example

import pandas as pd
import db_dtypes

value = "23:59:59.999999"
expected = datetime.time(23, 59, 59, 999999)

def test_time_parsing(value, expected):
    assert pandas.Series([value], dtype="dbtime")[0] == expected

# The results of the above assert are:

# assert datetime.time(23, 59, 58, 999271) == datetime.time(23, 59, 59, 999999)

Issue Description

Formatted time strings are incorrectly converted to datetime.time objects.

When running the prerelease tests for python-db-dtypes-pandas and checking for prerelease issues that might be associated with the pandas dev branches, we found identified that several python-db-dtypes-pandas tests failed depending on which nightly pandas branch we use.

In particular between these two commits:

b74dc5c077 yields PASSING TESTS
87da50042c yields FAILING TESTS

Not sure yet if this is due to something that derives from the python-db-dtypes-pandas lib OR from pandas OR from one of pandas’ dependencies.

NOTE: the reproducible example provided is just one of many tests in the test suite that fail with a wide variety of different times, etc.

Expected Behavior

The expected result of the reproducible example should be:

assert datetime.time(23, 59, 59, 999999) == datetime.time(23, 59, 59, 999999)

Installed Versions

For reference here are the installed libraries when we ran these tests. When running the failing tests, the only package that changed was the version of pandas. All other packages were the same:

python==3.10
# >>>>>>>>>>>>>>>>>>>>>>
asttokens==2.0.8
asyncmock==0.4.2
attrs==22.1.0
backcall==0.2.0
coverage==6.4.4
-e git+ssh://git@github.com/googleapis/python-db-dtypes-pandas.git@23d88d9e843e903a6e5a123123825de1bd88c722#egg=db_dtypes
decorator==5.1.1
executing==0.10.0
iniconfig==1.1.1
ipython==8.4.0
jedi==0.18.1
matplotlib-inline==0.1.6
mock==4.0.3
numpy==1.23.2
packaging==21.3
pandas==1.5.0.dev0+686.g5b76b77399  # A passing version of pandas
parso==0.8.3
pexpect==4.8.0
pickleshare==0.7.5
pluggy==1.0.0
prompt-toolkit==3.0.30
ptyprocess==0.7.0
pure-eval==0.2.2
py==1.11.0
pyarrow==9.0.0
Pygments==2.13.0
pyparsing==3.0.9
pytest==7.1.2
pytest-asyncio==0.19.0
pytest-cov==3.0.0
python-dateutil==2.8.2
pytz==2022.2.1
six==1.16.0
stack-data==0.4.0
tomli==2.0.1
traitlets==5.3.0
wcwidth==0.2.5

Issue Analytics

State:
Created a year ago
Comments:13 (13 by maintainers)

Top GitHub Comments

2reactions

tswastcommented, Aug 26, 2022

@phofl @chalmerlowe I’m able to reproduce this without the db-dtypes package.

In [14]: s = pd.Series([pd.Timestamp(
    year=1970, month=1, day=1, hour=23, minute=59, second=59, nanosecond=999_999_000
)], dtype='datetime64[ns]')

In [15]: s
Out[15]: 
0   1970-01-01 23:59:58.999271621
dtype: datetime64[ns]

I believe the problem might lie here: https://github.com/pandas-dev/pandas/blob/e94faa23e24c0abf9db74d79cfebe06676577867/pandas/_libs/tslibs/conversion.pyx#L391

As the call to total_seconds() can introduce floating point precision errors.

Edit: I see that total_seconds() logic was there before, but it seems something in that commit is resulting in a loss of precision.

0reactions

tswastcommented, Sep 16, 2022

I think to avoid this all together, no more than 999 nanoseconds should be allowed in the Timedelta constructor to mirror the behavior of other components in the datetime constructor. https://github.com/pandas-dev/pandas/issues/48538

A little unfortunate, but easy enough to workaround in https://github.com/googleapis/python-db-dtypes-pandas/pull/148

If we can make this an error in 1.5 instead of an overflow, that would save a lot of debugging time for other folks who might hit this issue.