question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: resample seems to convert hours to 00:00

See original GitHub issue
  • I have checked that this issue has not already been reported. (As far as I can see by using the search)

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
import datetime as dt
import pandas as pd

start_time = dt.datetime(year=2018, month=6, day=7, hour=18)
end_time = dt.datetime(year=2018, month=6, day=27, hour=18)
time_index_df = pd.date_range(
    start_time,
    end_time,
    freq="12H",
    name="datetime",
).to_frame(index=False)

time_index_df["test"] = 1
time_index_df["name"] = "Name"

time_index_df = time_index_df.resample("2d", convention="end", on="datetime").mean().reset_index()

print(time_index_df["datetime"].values)

time_index_df.to_csv(f"test_resample_{pd.__version__}")

Problem description

On pandas 0.23 the behaviour of resample will keep the correct datetime (18:00 in this case) after resample. But starting from 0.24, after resample, the datetime is now converted to (00:00). [this should explain why the current behaviour is a problem and why the expected output is a better solution]

Expected Output

On. 0.23 [β€˜2018-06-07T18:00:00.000000000’ β€˜2018-06-09T18:00:00.000000000’ β€˜2018-06-11T18:00:00.000000000’ β€˜2018-06-13T18:00:00.000000000’ β€˜2018-06-15T18:00:00.000000000’ β€˜2018-06-17T18:00:00.000000000’ β€˜2018-06-19T18:00:00.000000000’ β€˜2018-06-21T18:00:00.000000000’ β€˜2018-06-23T18:00:00.000000000’ β€˜2018-06-25T18:00:00.000000000’ β€˜2018-06-27T18:00:00.000000000’]

But Starting from 0.24 it is giving me [β€˜2018-06-07T00:00:00.000000000’ β€˜2018-06-09T00:00:00.000000000’ β€˜2018-06-11T00:00:00.000000000’ β€˜2018-06-13T00:00:00.000000000’ β€˜2018-06-15T00:00:00.000000000’ β€˜2018-06-17T00:00:00.000000000’ β€˜2018-06-19T00:00:00.000000000’ β€˜2018-06-21T00:00:00.000000000’ β€˜2018-06-23T00:00:00.000000000’ β€˜2018-06-25T00:00:00.000000000’ β€˜2018-06-27T00:00:00.000000000’]

Output of pd.show_versions()

0.23 INSTALLED VERSIONS

commit: None python: 3.6.9.final.0 python-bits: 64 OS: Linux OS-release: 5.3.0-59-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: en_AU.UTF-8

pandas: 0.23.0 pytest: None pip: 20.0.2 setuptools: 46.0.0 Cython: None numpy: 1.18.5 scipy: None pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.8.1 pytz: 2020.1 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: 2.5.8 xlrd: 0.9.4 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: 1.2.19 pymysql: None psycopg2: 2.8.5 (dt dec pq3 ext lo64) jinja2: 2.11.2 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None None

0.24 INSTALLED VERSIONS

commit: None python: 3.6.9.final.0 python-bits: 64 OS: Linux OS-release: 5.3.0-59-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: en_AU.UTF-8

pandas: 0.24.2 pytest: None pip: 20.0.2 setuptools: 46.0.0 Cython: None numpy: 1.18.5 scipy: None pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.8.1 pytz: 2020.1 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: 2.5.8 xlrd: 1.1.0 xlwt: None xlsxwriter: None lxml.etree: None bs4: None html5lib: None sqlalchemy: 1.2.19 pymysql: None psycopg2: 2.8.5 (dt dec pq3 ext lo64) jinja2: 2.11.2 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None None

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
hasB4Kcommented, Jun 18, 2020

@liverpool1026 The behavior has been to follow start_day since 2012 as you can see in this commit: https://github.com/pandas-dev/pandas/commit/31ca168faba43d761f5b53326b18250804ccd6ef. The behavior you are facing is actually a bug that has been fixed in v0.2.4 by #24159. Basically your current code is relying on a bug πŸ˜• …

The behavior of start_day of origin has been preserved as the default value in #31809 to avoid creating breaking changes.

The proof that your code is relying on a bug fixed in v0.2.4:

import datetime as dt
import pandas as pd

def print_resample(example_nb, start, end, resample_freq):
    print(f"\nEXAMPLE {example_nb}: {start} - {end} [{resample_freq}]")
    time_index_df = pd.date_range(start, end, freq="1H", name="datetime").to_frame(index=False)
    time_index_df["test"] = 1
    time_index_df = time_index_df.resample(resample_freq, convention="end", on="datetime").sum().reset_index()
    print(time_index_df)


print_resample(1, "2018-06-07 11:00", "2018-06-10 11:00", "2D")
print_resample(2, "2018-06-07 11:00", "2018-06-10 11:00", "12H")
print_resample(3, "2018-06-07 13:00", "2018-06-10 11:00", "12H")

Outputs (v0.2.3):

EXAMPLE 1: 2018-06-07 11:00 - 2018-06-10 11:00 [2D]
             datetime  test
0 2018-06-07 11:00:00    48
1 2018-06-09 11:00:00    25

EXAMPLE 2: 2018-06-07 11:00 - 2018-06-10 11:00 [12H]
             datetime  test
0 2018-06-07 00:00:00     1
1 2018-06-07 12:00:00    12
2 2018-06-08 00:00:00    12
3 2018-06-08 12:00:00    12
4 2018-06-09 00:00:00    12
5 2018-06-09 12:00:00    12
6 2018-06-10 00:00:00    12

EXAMPLE 3: 2018-06-07 13:00 - 2018-06-10 11:00 [12H]
             datetime  test
0 2018-06-07 12:00:00    11
1 2018-06-08 00:00:00    12
2 2018-06-08 12:00:00    12
3 2018-06-09 00:00:00    12
4 2018-06-09 12:00:00    12
5 2018-06-10 00:00:00    12
  • Example 1 is the buggy one since it uses a frequency bigger than 1D.
  • Example 3 is showing you that the behavior is not to start at midnight of the day of the first timestamp, but we are using the start of the day to align the bins. The argument origin of #31809 is just giving more choice to the user that want to specify on how to align the bins of the resampled data.

Now, that being said… What should you do to align from the start of your timeseries with those constraint before the version 1.1.0? Well first, I would advise to wait a few months the release of 1.1.0… But you could hack a bit around by converting this temporally into Timedeltas and it should work:

import datetime as dt
import pandas as pd

def print_resample_simulate_origin_start(example_nb, start, end, resample_freq):
    print(f"\nEXAMPLE {example_nb}: {start} - {end} [{resample_freq}]")
    time_index_df = pd.date_range(start, end, freq="1H", name="datetime").to_frame(index=False)
    time_index_df["test"] = 1
    
    # hack: transform datetime into timestamps to resample on start of the timeseries
    time_index_df["datetime"] -= pd.Timestamp(0)

    time_index_df = time_index_df.resample(resample_freq, convention="end", on="datetime").sum().reset_index()

    # hack: transform datetime back into timestamps
    time_index_df["datetime"] = pd.to_datetime(time_index_df["datetime"])
    print(time_index_df)


print_resample_simulate_origin_start(1, "2018-06-07 11:00", "2018-06-10 11:00", "2D")
print_resample_simulate_origin_start(2, "2018-06-07 11:00", "2018-06-10 11:00", "12H")
print_resample_simulate_origin_start(3, "2018-06-07 13:00", "2018-06-10 11:00", "12H")

Outputs (v0.2.3):


EXAMPLE 1: 2018-06-07 11:00 - 2018-06-10 11:00 [2D]
             datetime  test
0 2018-06-07 11:00:00    48
1 2018-06-09 11:00:00    25
2 2018-06-11 11:00:00     0

EXAMPLE 2: 2018-06-07 11:00 - 2018-06-10 11:00 [12H]
             datetime  test
0 2018-06-07 11:00:00    12
1 2018-06-07 23:00:00    12
2 2018-06-08 11:00:00    12
3 2018-06-08 23:00:00    12
4 2018-06-09 11:00:00    12
5 2018-06-09 23:00:00    12
6 2018-06-10 11:00:00     1

EXAMPLE 3: 2018-06-07 13:00 - 2018-06-10 11:00 [12H]
             datetime  test
0 2018-06-07 13:00:00    12
1 2018-06-08 01:00:00    12
2 2018-06-08 13:00:00    12
3 2018-06-09 01:00:00    12
4 2018-06-09 13:00:00    12
5 2018-06-10 01:00:00    11

I know this solution is not ideal… But again, this has been fixed in #31809 and it will be in the upcoming 1.1.0. I hope I have answered to your questions/issues @liverpool1026 and @dsandeep0138.

0reactions
liverpool1026commented, Jun 17, 2020

@deepandas11 That makes sense on master, by setting origin=β€œstart” fixes the issue.

However, it will still be an issue in 0.24 and 1.04 given that the origin kwarg was not introduced then.

If I am using 0.24, what’s the way to preserve the hours?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Convert pandas timezone-aware DateTimeIndex to naive ...
It consists of resampling from the last valid value in march, to avoid losing the 1 hour (in my case, all my data...
Read more >
Pandas resample() tricks you should know for manipulating ...
Downsampling is to resample a time-series dataset to a wider time frame. For example, from minutes to hours, from days to years.
Read more >
Time series / date functionality β€” pandas 1.5.2 documentation
Resampling or converting a time series to a particular frequency ... These Timestamp and datetime objects have exact hours, minutes,Β ...
Read more >
Bug listing with status RESOLVED with resolution TEST- ...
Bug :233 - "Emacs segfaults when merged through the sandbox. ... needs to be converted" status:RESOLVED resolution:TEST-REQUEST severity:normal Β· Bug:50851Β ...
Read more >
Troubleshooting RSS in Campaigns
Is there a typo or formatting error in your source code? ... we pull in the feed, check the Resize RSS feed images...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found