`Series.resample().nlargest` produces incorrect output
See original GitHub issueCode Sample, a copy-pastable example if possible
With this setup:
import numpy as np
n = 1000
dates = pd.date_range(start='2010-01-01', periods=n)
rain_random = pd.Series(data=np.random.uniform(size=n), index=dates)
these two operations given different results:
rain_random.groupby(rain_random.index.year).nlargest(3)
rain_random.resample('A').nlargest(3)
Problem description
The Series.resample().nlargest()
operation is inconsistent with DataFrame.resample()[column].nlargest()
and the groupby
equivalent. It emits a warning
Output:
/Users/schofield/miniconda/envs/py36/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning:
.resample() is now a deferred operation
You called nlargest(...) on this deferred object which materialized it into a series
by implicitly taking the mean. Use .resample(...).mean() instead
"""Entry point for launching an IPython kernel.
Out[427]:
2010-12-31 0.507550
2012-12-31 0.490082
2011-12-31 0.478356
dtype: float64
Expected output:
Date Date
1930-12-31 1930-10-06 288.135370
1930-10-05 285.587734
1930-10-07 259.439935
1930-10-08 227.587389
1930-10-09 190.054844
1931-12-31 1931-01-26 3052.104566
1931-01-25 2839.126102
1931-01-29 2196.167129
1931-02-01 1953.331709
1931-01-27 1893.975328
1932-12-31 1932-01-19 9526.953864
1932-01-20 4278.291105
1932-03-03 2952.348903
1932-03-02 2946.385433
1932-03-04 2098.108897
pd.show_versions() output:
INSTALLED VERSIONS
commit: None python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 16.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: en_AU.UTF-8
pandas: 0.20.1 pytest: 3.0.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: None IPython: 5.3.0 sphinx: 1.6.3 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.7 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.3 bs4: 4.6.0 html5lib: 0.999 sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: 0.5.0
Issue Analytics
- State:
- Created 6 years ago
- Reactions:1
- Comments:10 (6 by maintainers)
Top GitHub Comments
pandas and virtually all open source project are all volunteer
the core team will review pull requests
since there are 3000+ open issue most patches must come from the community
issues get fixed when folks like you open pull requests
pull requests are accepted; this is how issues get addressed in open source