question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Quantile function fails when performing groupby on Time Zone Aware Timestamps

See original GitHub issue

Code Sample, a copy-pastable example if possible

Maybe not a high priority bug, but I have the feeling it can easily fixed. I just have not enough understanding on how it should be fixed. Please find below the MCVE to reproduce it:

import numpy as np
import pandas as pd

# Sample Dataset:
n = 200
c = np.random.choice([0,1,2], size=(n,))
d = np.random.randn(n)
t = pd.date_range(start='2020-04-19 00:00:00', freq='1T', periods=n, tz='UTC')
df = pd.DataFrame([r for r in zip(c, t, d)], columns=['category', 'timestamp', 'value'])
df['rtime'] = df['timestamp'].dt.floor('1H')

# Failing operation:
df.groupby('rtime').quantile([0.1, 0.5, 0.9])

Problem description

The traceback of the error is a bit laconic and I have not enough experience in Pandas source code to cover all details of this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-aff2c9a206a6> in <module>
----> 1 df.groupby('rtime').quantile([0.1,0.2])

/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py in quantile(self, q, interpolation)
   1926                     interpolation=interpolation,
   1927                 )
-> 1928                 for qi in q
   1929             ]
   1930             result = concat(results, axis=0, keys=q)

/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py in <listcomp>(.0)
   1926                     interpolation=interpolation,
   1927                 )
-> 1928                 for qi in q
   1929             ]
   1930             result = concat(results, axis=0, keys=q)

/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py in _get_cythonized_result(self, how, cython_dtype, aggregate, needs_values, needs_mask, needs_ngroups, result_is_index, pre_processing, post_processing, **kwargs)
   2289                 func = partial(func, ngroups)
   2290 
-> 2291             func(**kwargs)  # Call func to modify indexer values in place
   2292 
   2293             if result_is_index:

pandas/_libs/groupby.pyx in pandas._libs.groupby.__pyx_fused_cpdef()

TypeError: No matching signature found

I have found similar issues on GitHub with the same exception, but I think it is too generic to be the same related problem. Additionally, I may have found a simple corner case issue with TZ aware timestamp.

I had some hard time to reproduce the error when building the MCVE, finally I found out that it is related to the existence of an extra columns holding Time Zone aware timestamps.

Maybe the fix it is just about updating function signature to add TZ aware timestamps.

The problem can be circonvolved using one of the following writing:

df.groupby('rtime')['value'].quantile([0.1,0.2])

Or:

df['timestamp'] = df['timestamp'].dt.tz_convert(None)
df.groupby('rtime').quantile([0.1,0.2])

Or:

df.pop('timestamp')
df.groupby('rtime').quantile([0.1,0.2])

Which strongly suggests it is the existence of the TZ Aware extra column timestamp that makes the function quantile fail.

Expected Output

Expected output might be no distinction in flow when performing groupby operations on dataframe holding TimeZone aware timestamp as it does with TZ naive timestamp.

Note: Thank you for building such a great tool, pandas is a first class middleware. Your efforts are strongly appreciated. Let me know how I can help, I would be happy to understand how this can be corrected.

Output of pd.show_versions()

commit : None python : 3.6.9.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-91-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.0.3 numpy : 1.18.2 pytz : 2019.3 dateutil : 2.8.1 pip : 9.0.1 setuptools : 46.1.3 Cython : 0.29.14 pytest : 5.3.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 1.1.8 lxml.etree : 4.3.4 html5lib : 0.999999999 pymysql : None psycopg2 : 2.8.4 (dt dec pq3 ext lo64) jinja2 : 2.11.1 IPython : 7.13.0 pandas_datareader: None bs4 : 4.7.1 bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.3.4 matplotlib : 3.2.1 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.13.0 pytables : None pytest : 5.3.2 pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : 1.3.15 tables : None tabulate : 0.8.3 xarray : None xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.1.8 numba : None

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
mroeschkecommented, Jun 28, 2022

hi @mroeschke, is this issue for adding a test case still open ? if so can I pick this ?

Go for it!

0reactions
dannyi96commented, Jun 28, 2022

take

Read more comments on GitHub >

github_iconTop Results From Across the Web

14 Time Zone Handling — Pandas Doc - GitHub Pages
Pandas provides rich support for working with timestamps in different time zones using pytz and dateutil libraries. dateutil support is new in 0.14.1...
Read more >
What's new in 0.25.0 (July 18, 2019) - Pandas
pandas has added special groupby behavior, known as “named aggregation”, for naming the output columns when applying multiple aggregation functions to specific ...
Read more >
Convert pandas timezone-aware DateTimeIndex to naive ...
You can use the function tz_localize to make a Timestamp or DateTimeIndex timezone aware, but how can you do the opposite: how can...
Read more >
v0.25.0 版本特性(2019年7月18日) - Pandas 中文
Pandas添加了特殊的groupby行为,称为“命名聚合”,用于在将多个聚合函数应用于 ... reduction function (e.g. numpy.minimum() ) to a timezone aware ...
Read more >
Compute Functions — Apache Arrow v10.0.1
An error is returned if the timestamps already have the timezone metadata set. Function name. Arity. Input types. Output type. Options class. Notes ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found