question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot compare tz-naive and tz-aware timestamps on concat

See original GitHub issue

What happened:

When concatenating two dask dataframes with indices dype=datetime64[ns, UTC], I get a TypeError: Cannot compare tz-naive and tz-aware timestamps. One of the the dask dataframe was created with dd.from_pandas and the other with dd.read_parquet

What you expected to happen:

An happy concatenation 😉

Minimal Complete Verifiable Example:

import dask.dataframe as dd
import pandas

filename = "outfile.parq"

# create a DF with tz=UTC and write to parquet file
df = pandas.DataFrame(
    columns=["A"], index=pandas.date_range("2018", "2019", freq="MS", tz="UTC"), data=1.0
)
ddf = dd.from_pandas(df, npartitions=1)
ddf.to_parquet(filename)

# read back this dask df into rddf
rddf = dd.read_parquet(filename)

# print indices
print(ddf.index)
print(rddf.index)


# attempt concatenation of dask df from parquet and new dask df
nddf = dd.concat([rddf, ddf])
# this raises the following traceback
# Traceback (most recent call last):
#   File ".../tst_parquet_tz.py", line 22, in <module>
#     nddf = dd.concat([rddf, ddf])
#   File "...\site-packages\dask\dataframe\multi.py", line 1108, in concat
#     if all(
#   File "...\site-packages\dask\dataframe\multi.py", line 1109, in <genexpr>
#     dfs[i].divisions[-1] < dfs[i + 1].divisions[0]
#   File "pandas\_libs\tslibs\timestamps.pyx", line 281, in pandas._libs.tslibs.timestamps._Timestamp.__richcmp__
#   File "pandas\_libs\tslibs\timestamps.pyx", line 295, in pandas._libs.tslibs.timestamps._Timestamp._assert_tzawareness_compat
# TypeError: Cannot compare tz-naive and tz-aware timestamps

Anything else we need to know?:

When printing both indices, we see that the dtype of both indices are datetime64[ns, UTC] yet the representation of the index coming from read-parquet shows tz-naive dates.

Dask Index Structure:
npartitions=1
2018-01-01 00:00:00+00:00    datetime64[ns, UTC]
2019-01-01 00:00:00+00:00                    ...
dtype: datetime64[ns, UTC]
Dask Name: from_pandas, 2 tasks
Dask Index Structure:
npartitions=1
2018-01-01    datetime64[ns, UTC]
2019-01-01                    ...
dtype: datetime64[ns, UTC]
Dask Name: read-parquet, 2 tasks

Environment:

Dask version: 2.30.0 Python version: 3.8 Operating System: win10 Install method (conda, pip, source): pip

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
martindurantcommented, Dec 4, 2020

Will look…

I’ll mention that parquet does not store time zones, this information only goes into the pandas metadata (i.e., only for data that came from pandas originally), so it is quite likely that this hasn’t been implemented in fastparquet at all.

0reactions
sdementencommented, Dec 16, 2020

I have checked with master for dask and fastparquet and the issue is still there.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Cannot compare tz-naive and tz-aware timestamps on concat
What happened: When concatenating two dask dataframes with indices dype=datetime64[ns, UTC], I get a TypeError: Cannot compare tz-naive and ...
Read more >
Convert pandas timezone-aware DateTimeIndex to naive ...
It throws ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 ... tz-naive """ # Calculate timezone offset relative to UTC timestamp ...
Read more >
Compare naive and tz-aware Values - Pandas Brain Teasers
TypeError : Cannot compare tz-naive and tz-aware timestamps. We must work with tz-aware timestamps to convert from one time zone to another.
Read more >
pandas.DatetimeIndex.tz_localize
Localize tz-naive DatetimeIndex to tz-aware DatetimeIndex. ... 'coerce' will return NaT if the timestamp can not be converted to the specified time zone....
Read more >
What's new in 0.24.0 (January 25, 2019) - Pandas 中文
DataFrame comparison operations broadcasting changes ... Timezone converting a tz-aware datetime.datetime or Timestamp (opens new window) ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found