question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: resample closed='left' not binning correctly.

See original GitHub issue

related: http://stackoverflow.com/questions/21329425/resampling-a-pandas-dataframe-with-loffset-introduces-an-additional-offset-of-an

Hey pandas team. Sorry to have gone MIA the past week, super busy with work. I promise (and look forward to) contributing more soon. 😃

Still, I wanted to note that I came across what I believe to a be a bug in resample() when trying to change the interval of the binning with closed='left'. I know that there have been a few changes to the resample() API since Wes’ book, however, I don’t believe they changed this functionality, but I have been wrong before 😃

Bug can be reproduced using the example from Wes’ book, generating 12 mins of data like:

In [3]: rng = pd.date_range('1/1/2000', periods=12, freq='T')
In [4]: ts = pd.Series(np.arange(12), index=rng)
In [5]: ts
Out[5]: 
2000-01-01 00:00:00     0
2000-01-01 00:01:00     1
2000-01-01 00:02:00     2
2000-01-01 00:03:00     3
2000-01-01 00:04:00     4
2000-01-01 00:05:00     5
2000-01-01 00:06:00     6
2000-01-01 00:07:00     7
2000-01-01 00:08:00     8
2000-01-01 00:09:00     9
2000-01-01 00:10:00    10
2000-01-01 00:11:00    11
Freq: T, dtype: int64

we can do a simple resample to 5 mins like:

In [6]: ts.resample('5min', how='sum')
Out[6]: 
2000-01-01 00:00:00    10
2000-01-01 00:05:00    35
2000-01-01 00:10:00    21
Freq: 5T, dtype: int64

For my use, I need this resampling to be ‘backwards looking’ so that the summations at each resampled timestamp include the previous 4 minutes. Documentation (and Wes’ book) suggest this is achieved by binning with closed='left', however, this results in the same output as above:

In [7]: ts.resample('5min', how='sum', closed='left')
Out[8]: 
2000-01-01 00:00:00    10
2000-01-01 00:05:00    35
2000-01-01 00:10:00    21
Freq: 5T, dtype: int64

I was looking for the following result (note that the first timestamp is at 00:05:00 and with hanging data dropped):

2000-01-01 00:05:00    10
2000-01-01 00:10:00    35
Freq: 5T, dtype: int64

I am able to generate this by combining loffset='5min' and then slicing into the resultant Series to remove the:

In [10]: ts.resample('5min', how='sum', closed='left', loffset='5min')[:-1]
Out[10]: 
2000-01-01 00:05:00    10
2000-01-01 00:10:00    35
Freq: 5T, dtype: int64

but this is hardly ideal as it’s not known in advance if time series ends with a timestamp that resolves equally to the final timestamp of the resampling procedure!

Apologies if I am missing something—any thoughts, help or guidance is welcomed! Thanks so much.

Issue Analytics

  • State:closed
  • Created 10 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
mroeschkecommented, Mar 31, 2020

I agree that I think base is the proper way to handle this. Changing the resample behavior after all this time might be to much of a change. Additionally https://github.com/pandas-dev/pandas/pull/31809 will provide an easier way to specify where the origin timestamp should start at. Closing.

0reactions
nehaleckycommented, Mar 31, 2020

Thanks @mroeschke! Look forward to checking out enhancements to resample provided by #31809. Keep it up, pandas team!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas time series resample, binning seems off
Now I have an extra bin in 2019 with data from 2018-12-31... Is this working properly? am I missing any option I should...
Read more >
Release Notes — pandas 0.19.0 documentation - PyData |
The default binning/labeling behavior for resample has been changed to closed='left', label='left' for daily and lower frequencies.
Read more >
Using the Pandas “Resample” Function - Towards Data Science
This article is an introductory dive into the technical aspects of the pandas resample function for datetime manipulation.
Read more >
DensityHistogram, "Log" plot with specific binning not working ...
Any idea how to use the DensitiyHistogram with specific binning in xy? Automatic binning works, but it is not useful for my application....
Read more >
What's New — pandas 0.18.1 documentation - API Manual
These changes conform sparse handling to return the correct types and work to make a ... Bug in SparseArray[] indexing with tuples are...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found