Cannot compare tz-naive and tz-aware timestamps on concat
See original GitHub issueWhat happened:
When concatenating two dask dataframes with indices dype=datetime64[ns, UTC], I get a TypeError: Cannot compare tz-naive and tz-aware timestamps
. One of the the dask dataframe was created with dd.from_pandas
and the other with dd.read_parquet
What you expected to happen:
An happy concatenation 😉
Minimal Complete Verifiable Example:
import dask.dataframe as dd
import pandas
filename = "outfile.parq"
# create a DF with tz=UTC and write to parquet file
df = pandas.DataFrame(
columns=["A"], index=pandas.date_range("2018", "2019", freq="MS", tz="UTC"), data=1.0
)
ddf = dd.from_pandas(df, npartitions=1)
ddf.to_parquet(filename)
# read back this dask df into rddf
rddf = dd.read_parquet(filename)
# print indices
print(ddf.index)
print(rddf.index)
# attempt concatenation of dask df from parquet and new dask df
nddf = dd.concat([rddf, ddf])
# this raises the following traceback
# Traceback (most recent call last):
# File ".../tst_parquet_tz.py", line 22, in <module>
# nddf = dd.concat([rddf, ddf])
# File "...\site-packages\dask\dataframe\multi.py", line 1108, in concat
# if all(
# File "...\site-packages\dask\dataframe\multi.py", line 1109, in <genexpr>
# dfs[i].divisions[-1] < dfs[i + 1].divisions[0]
# File "pandas\_libs\tslibs\timestamps.pyx", line 281, in pandas._libs.tslibs.timestamps._Timestamp.__richcmp__
# File "pandas\_libs\tslibs\timestamps.pyx", line 295, in pandas._libs.tslibs.timestamps._Timestamp._assert_tzawareness_compat
# TypeError: Cannot compare tz-naive and tz-aware timestamps
Anything else we need to know?:
When printing both indices, we see that the dtype of both indices are datetime64[ns, UTC]
yet the representation of the index coming from read-parquet shows tz-naive dates.
Dask Index Structure:
npartitions=1
2018-01-01 00:00:00+00:00 datetime64[ns, UTC]
2019-01-01 00:00:00+00:00 ...
dtype: datetime64[ns, UTC]
Dask Name: from_pandas, 2 tasks
Dask Index Structure:
npartitions=1
2018-01-01 datetime64[ns, UTC]
2019-01-01 ...
dtype: datetime64[ns, UTC]
Dask Name: read-parquet, 2 tasks
Environment:
Dask version: 2.30.0 Python version: 3.8 Operating System: win10 Install method (conda, pip, source): pip
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Cannot compare tz-naive and tz-aware timestamps on concat
What happened: When concatenating two dask dataframes with indices dype=datetime64[ns, UTC], I get a TypeError: Cannot compare tz-naive and ...
Read more >Convert pandas timezone-aware DateTimeIndex to naive ...
It throws ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 ... tz-naive """ # Calculate timezone offset relative to UTC timestamp ...
Read more >Compare naive and tz-aware Values - Pandas Brain Teasers
TypeError : Cannot compare tz-naive and tz-aware timestamps. We must work with tz-aware timestamps to convert from one time zone to another.
Read more >pandas.DatetimeIndex.tz_localize
Localize tz-naive DatetimeIndex to tz-aware DatetimeIndex. ... 'coerce' will return NaT if the timestamp can not be converted to the specified time zone....
Read more >What's new in 0.24.0 (January 25, 2019) - Pandas 中文
DataFrame comparison operations broadcasting changes ... Timezone converting a tz-aware datetime.datetime or Timestamp (opens new window) ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Will look…
I’ll mention that parquet does not store time zones, this information only goes into the pandas metadata (i.e., only for data that came from pandas originally), so it is quite likely that this hasn’t been implemented in fastparquet at all.
I have checked with master for dask and fastparquet and the issue is still there.