Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`Series.resample().nlargest` produces incorrect output

See original GitHub issue

Code Sample, a copy-pastable example if possible

With this setup:

import numpy as np
n = 1000
dates = pd.date_range(start='2010-01-01', periods=n)
rain_random = pd.Series(data=np.random.uniform(size=n), index=dates)

these two operations given different results:

rain_random.groupby(rain_random.index.year).nlargest(3)

rain_random.resample('A').nlargest(3)

Problem description

The Series.resample().nlargest() operation is inconsistent with DataFrame.resample()[column].nlargest() and the groupby equivalent. It emits a warning

Output:

/Users/schofield/miniconda/envs/py36/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: 
.resample() is now a deferred operation
You called nlargest(...) on this deferred object which materialized it into a series
by implicitly taking the mean.  Use .resample(...).mean() instead
  """Entry point for launching an IPython kernel.
Out[427]:
2010-12-31    0.507550
2012-12-31    0.490082
2011-12-31    0.478356
dtype: float64

Expected output:

Date        Date      
1930-12-31  1930-10-06      288.135370
            1930-10-05      285.587734
            1930-10-07      259.439935
            1930-10-08      227.587389
            1930-10-09      190.054844
1931-12-31  1931-01-26     3052.104566
            1931-01-25     2839.126102
            1931-01-29     2196.167129
            1931-02-01     1953.331709
            1931-01-27     1893.975328
1932-12-31  1932-01-19     9526.953864
            1932-01-20     4278.291105
            1932-03-03     2952.348903
            1932-03-02     2946.385433
            1932-03-04     2098.108897

pd.show_versions() output:

INSTALLED VERSIONS

commit: None python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 16.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: en_AU.UTF-8

pandas: 0.20.1 pytest: 3.0.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: None IPython: 5.3.0 sphinx: 1.6.3 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.7 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.3 bs4: 4.6.0 html5lib: 0.999 sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: 0.5.0

Issue Analytics

State:
Created 6 years ago
Reactions:1
Comments:10 (6 by maintainers)

Top GitHub Comments

1reaction

jrebackcommented, Jun 8, 2020

pandas and virtually all open source project are all volunteer

the core team will review pull requests

since there are 3000+ open issue most patches must come from the community

issues get fixed when folks like you open pull requests

1reaction

jrebackcommented, Jun 7, 2020

pull requests are accepted; this is how issues get addressed in open source

Top Results From Across the Web

python 3.x - Pandas resample() Series giving incorrect indexes

Resample is a tricky function. The main issue with the resampling is that you need to select which value you want to keep...

10 Resampling — Pandas Doc - GitHub Pages

.resample() is a time-based groupby, followed by a reduction method on each ... or numpy array function that takes an array and produces...

pandas.Series.resample — pandas 1.5.2 documentation

Convenience method for frequency conversion and resampling of time series. The object must have a datetime-like index ( DatetimeIndex , PeriodIndex , or ......

pandas GroupBy: Your Guide to Grouping Data in Python

In this tutorial, you'll learn how to work adeptly with the pandas GroupBy facility while mastering ways to manipulate, transform, ...

Pandas Grouper and Agg Functions Explained

I was recently working on a problem and noticed that pandas had a ... has robust capabilities to manipulate and summarize time series...