Quantile function fails when performing groupby on Time Zone Aware Timestamps
See original GitHub issueCode Sample, a copy-pastable example if possible
Maybe not a high priority bug, but I have the feeling it can easily fixed. I just have not enough understanding on how it should be fixed. Please find below the MCVE to reproduce it:
import numpy as np
import pandas as pd
# Sample Dataset:
n = 200
c = np.random.choice([0,1,2], size=(n,))
d = np.random.randn(n)
t = pd.date_range(start='2020-04-19 00:00:00', freq='1T', periods=n, tz='UTC')
df = pd.DataFrame([r for r in zip(c, t, d)], columns=['category', 'timestamp', 'value'])
df['rtime'] = df['timestamp'].dt.floor('1H')
# Failing operation:
df.groupby('rtime').quantile([0.1, 0.5, 0.9])
Problem description
The traceback of the error is a bit laconic and I have not enough experience in Pandas source code to cover all details of this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-35-aff2c9a206a6> in <module>
----> 1 df.groupby('rtime').quantile([0.1,0.2])
/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py in quantile(self, q, interpolation)
1926 interpolation=interpolation,
1927 )
-> 1928 for qi in q
1929 ]
1930 result = concat(results, axis=0, keys=q)
/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py in <listcomp>(.0)
1926 interpolation=interpolation,
1927 )
-> 1928 for qi in q
1929 ]
1930 result = concat(results, axis=0, keys=q)
/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py in _get_cythonized_result(self, how, cython_dtype, aggregate, needs_values, needs_mask, needs_ngroups, result_is_index, pre_processing, post_processing, **kwargs)
2289 func = partial(func, ngroups)
2290
-> 2291 func(**kwargs) # Call func to modify indexer values in place
2292
2293 if result_is_index:
pandas/_libs/groupby.pyx in pandas._libs.groupby.__pyx_fused_cpdef()
TypeError: No matching signature found
I have found similar issues on GitHub with the same exception, but I think it is too generic to be the same related problem. Additionally, I may have found a simple corner case issue with TZ aware timestamp.
I had some hard time to reproduce the error when building the MCVE, finally I found out that it is related to the existence of an extra columns holding Time Zone aware timestamps.
Maybe the fix it is just about updating function signature to add TZ aware timestamps.
The problem can be circonvolved using one of the following writing:
df.groupby('rtime')['value'].quantile([0.1,0.2])
Or:
df['timestamp'] = df['timestamp'].dt.tz_convert(None)
df.groupby('rtime').quantile([0.1,0.2])
Or:
df.pop('timestamp')
df.groupby('rtime').quantile([0.1,0.2])
Which strongly suggests it is the existence of the TZ Aware extra column timestamp
that makes the function quantile
fail.
Expected Output
Expected output might be no distinction in flow when performing groupby
operations on dataframe holding TimeZone aware timestamp as it does with TZ naive timestamp.
Note: Thank you for building such a great tool, pandas
is a first class middleware. Your efforts are strongly appreciated. Let me know how I can help, I would be happy to understand how this can be corrected.
Output of pd.show_versions()
pandas : 1.0.3 numpy : 1.18.2 pytz : 2019.3 dateutil : 2.8.1 pip : 9.0.1 setuptools : 46.1.3 Cython : 0.29.14 pytest : 5.3.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 1.1.8 lxml.etree : 4.3.4 html5lib : 0.999999999 pymysql : None psycopg2 : 2.8.4 (dt dec pq3 ext lo64) jinja2 : 2.11.1 IPython : 7.13.0 pandas_datareader: None bs4 : 4.7.1 bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.3.4 matplotlib : 3.2.1 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.13.0 pytables : None pytest : 5.3.2 pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : 1.3.15 tables : None tabulate : 0.8.3 xarray : None xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.1.8 numba : None
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (7 by maintainers)
Top GitHub Comments
Go for it!
take