BUG: resample seems to convert hours to 00:00
See original GitHub issue-
I have checked that this issue has not already been reported. (As far as I can see by using the search)
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
# Your code here
import datetime as dt
import pandas as pd
start_time = dt.datetime(year=2018, month=6, day=7, hour=18)
end_time = dt.datetime(year=2018, month=6, day=27, hour=18)
time_index_df = pd.date_range(
start_time,
end_time,
freq="12H",
name="datetime",
).to_frame(index=False)
time_index_df["test"] = 1
time_index_df["name"] = "Name"
time_index_df = time_index_df.resample("2d", convention="end", on="datetime").mean().reset_index()
print(time_index_df["datetime"].values)
time_index_df.to_csv(f"test_resample_{pd.__version__}")
Problem description
On pandas 0.23 the behaviour of resample will keep the correct datetime (18:00 in this case) after resample. But starting from 0.24, after resample, the datetime is now converted to (00:00). [this should explain why the current behaviour is a problem and why the expected output is a better solution]
Expected Output
On. 0.23 [β2018-06-07T18:00:00.000000000β β2018-06-09T18:00:00.000000000β β2018-06-11T18:00:00.000000000β β2018-06-13T18:00:00.000000000β β2018-06-15T18:00:00.000000000β β2018-06-17T18:00:00.000000000β β2018-06-19T18:00:00.000000000β β2018-06-21T18:00:00.000000000β β2018-06-23T18:00:00.000000000β β2018-06-25T18:00:00.000000000β β2018-06-27T18:00:00.000000000β]
But Starting from 0.24 it is giving me [β2018-06-07T00:00:00.000000000β β2018-06-09T00:00:00.000000000β β2018-06-11T00:00:00.000000000β β2018-06-13T00:00:00.000000000β β2018-06-15T00:00:00.000000000β β2018-06-17T00:00:00.000000000β β2018-06-19T00:00:00.000000000β β2018-06-21T00:00:00.000000000β β2018-06-23T00:00:00.000000000β β2018-06-25T00:00:00.000000000β β2018-06-27T00:00:00.000000000β]
Output of pd.show_versions()
0.23 INSTALLED VERSIONS
commit: None python: 3.6.9.final.0 python-bits: 64 OS: Linux OS-release: 5.3.0-59-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: en_AU.UTF-8
pandas: 0.23.0 pytest: None pip: 20.0.2 setuptools: 46.0.0 Cython: None numpy: 1.18.5 scipy: None pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.8.1 pytz: 2020.1 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: 2.5.8 xlrd: 0.9.4 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: 1.2.19 pymysql: None psycopg2: 2.8.5 (dt dec pq3 ext lo64) jinja2: 2.11.2 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None None
0.24 INSTALLED VERSIONS
commit: None python: 3.6.9.final.0 python-bits: 64 OS: Linux OS-release: 5.3.0-59-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: en_AU.UTF-8
pandas: 0.24.2 pytest: None pip: 20.0.2 setuptools: 46.0.0 Cython: None numpy: 1.18.5 scipy: None pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.8.1 pytz: 2020.1 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: 2.5.8 xlrd: 1.1.0 xlwt: None xlsxwriter: None lxml.etree: None bs4: None html5lib: None sqlalchemy: 1.2.19 pymysql: None psycopg2: 2.8.5 (dt dec pq3 ext lo64) jinja2: 2.11.2 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None None
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (4 by maintainers)
Top GitHub Comments
@liverpool1026 The behavior has been to follow
start_day
since 2012 as you can see in this commit: https://github.com/pandas-dev/pandas/commit/31ca168faba43d761f5b53326b18250804ccd6ef. The behavior you are facing is actually a bug that has been fixed in v0.2.4 by #24159. Basically your current code is relying on a bug π β¦The behavior of
start_day
oforigin
has been preserved as the default value in #31809 to avoid creating breaking changes.The proof that your code is relying on a bug fixed in v0.2.4:
Outputs (v0.2.3):
origin
of #31809 is just giving more choice to the user that want to specify on how to align the bins of the resampled data.Now, that being said⦠What should you do to align from the start of your timeseries with those constraint before the version 1.1.0? Well first, I would advise to wait a few months the release of 1.1.0⦠But you could hack a bit around by converting this temporally into Timedeltas and it should work:
Outputs (v0.2.3):
I know this solution is not ideal⦠But again, this has been fixed in #31809 and it will be in the upcoming 1.1.0. I hope I have answered to your questions/issues @liverpool1026 and @dsandeep0138.
@deepandas11 That makes sense on master, by setting origin=βstartβ fixes the issue.
However, it will still be an issue in 0.24 and 1.04 given that the origin kwarg was not introduced then.
If I am using 0.24, whatβs the way to preserve the hours?