BUG: Odd timezone offset change with old datetimes with tz_convert
See original GitHub issue-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import pandas as pd
[
pd.to_datetime(["1900-01-01"]).tz_localize("UTC").tz_convert("Australia/Broken_Hill"),
pd.to_datetime(["1900-01-01"]).tz_localize("UTC").tz_convert("Pacific/Marquesas"),
pd.to_datetime(["2000-01-01"]).tz_localize("UTC").tz_convert("Pacific/Marquesas"),
pd.to_datetime(["2000-01-01"]).tz_localize("UTC").tz_convert("Australia/Broken_Hill"),
pd.to_datetime(["1900-01-01"]).tz_localize("UTC").tz_convert("Etc/GMT-9"),
pd.to_datetime(["2000-01-01"]).tz_localize("UTC").tz_convert("Etc/GMT-9"),
]
Returns:
[DatetimeIndex(['1900-01-01 09:26:00+09:26'], dtype='datetime64[ns, Australia/Broken_Hill]', freq=None),
DatetimeIndex(['1899-12-31 14:42:00-09:18'], dtype='datetime64[ns, Pacific/Marquesas]', freq=None),
DatetimeIndex(['1999-12-31 14:30:00-09:30'], dtype='datetime64[ns, Pacific/Marquesas]', freq=None),
DatetimeIndex(['2000-01-01 10:30:00+10:30'], dtype='datetime64[ns, Australia/Broken_Hill]', freq=None),
DatetimeIndex(['1900-01-01 09:00:00+09:00'], dtype='datetime64[ns, Etc/GMT-9]', freq=None),
DatetimeIndex(['2000-01-01 09:00:00+09:00'], dtype='datetime64[ns, Etc/GMT-9]', freq=None)]
Problem description
Timezone offset seems to change when it shouldn’t? I’m not 100% this is not the correct behavior but it seems odd. I hope this is known behavior and not something esoteric.
Expected Output
Probably this:
[DatetimeIndex(['1900-01-01 10:30:00+10:30'], dtype='datetime64[ns, Australia/Broken_Hill]', freq=None),
DatetimeIndex(['1899-12-31 14:30:00-09:30'], dtype='datetime64[ns, Pacific/Marquesas]', freq=None),
DatetimeIndex(['1999-12-31 14:30:00-09:30'], dtype='datetime64[ns, Pacific/Marquesas]', freq=None),
DatetimeIndex(['2000-01-01 10:30:00+10:30'], dtype='datetime64[ns, Australia/Broken_Hill]', freq=None),
DatetimeIndex(['1900-01-01 09:00:00+09:00'], dtype='datetime64[ns, Etc/GMT-9]', freq=None),
DatetimeIndex(['2000-01-01 09:00:00+09:00'], dtype='datetime64[ns, Etc/GMT-9]', freq=None)]
I’m not sure these timezones even existed then so this might be an invalid calculation.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 2cb96529396d93b46abab7bbc73a208e708c642e python : 3.7.0.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-73-lowlatency Version : #82-Ubuntu SMP PREEMPT Wed Apr 14 19:19:50 UTC 2021 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.2.4 numpy : 1.19.1 pytz : 2020.5 dateutil : 2.8.1 pip : 20.2.3 setuptools : 50.3.0.post20201006 Cython : None pytest : 6.2.1 hypothesis : None sphinx : 2.4.4 blosc : None feather : None xlsxwriter : None lxml.etree : 4.6.2 html5lib : None pymysql : None psycopg2 : 2.8.6 (dt dec pq3 ext lo64) jinja2 : 2.11.2 IPython : 7.18.1 pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.3.1 numexpr : None odfpy : None openpyxl : 3.0.7 pandas_gbq : None pyarrow : 1.0.1 pyxlsb : None s3fs : None scipy : 1.5.2 sqlalchemy : 1.3.19 tables : None tabulate : 0.8.7 xarray : None xlrd : 2.0.1 xlwt : None numba : None
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (5 by maintainers)
Thanks for the report, but as mentioned I think this is ultimately a pytz issue (we also have a note in our timezone docs that tz libraries may have different timezone definitions). Since this an upstream issue, I’m unsure if there’s a pandas specific fix here, but happy to reopen if there is.
Yeah it gets really fuzzy really fast. To be fair precise sub day timestamps pre-1900 would be rare. However it would be IMO important to be able to express them.