question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: DataFrame.groupby drops timedelta column in v1.3.0

See original GitHub issue
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd
df = pd.DataFrame(
    {
        "duration":[pd.Timedelta(i, unit="days") for i in range(1,6)],
        "value":[1,0,1,2,1]
    }
)

df.groupby("value").sum()

results in:

Empty DataFrame
Columns: []
Index: [0, 1, 2]

The same behaviour occurs whether it’s sum, mean, median etc

Note the following gives close to the expected result:

df.groupby("value")["duration"].sum()
value
0   2 days
1   9 days
2   4 days
Name: duration, dtype: timedelta64[ns]

Also, this problem may be unique to Timedeltas as the following similar example, has no problem:

df = pd.DataFrame(
    {
        "duration":[i for i in range(1,6)],
        "value":[1,0,1,2,1]
    }
)

df.groupby("value").sum()

gives:

       duration
value
0             2
1             9
2             4

Problem description

In v1.3.0 (and master, commit dad3e7) when performing a groupby operation on a dataframe a timedelta column goes missing. This behaviour does not occur in pandas 1.2.5

Expected Output

The expected output is given by pandas 1.2.5:

      duration
value
0       2 days
1       9 days
2       4 days

It is a dataframe, with one column “duration”, indexed by the groupby keys

Output of pd.show_versions()

INSTALLED VERSIONS

commit : f00ed8f47020034e752baf0250483053340971b0 python : 3.7.5.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19041 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.None

pandas : 1.3.0 numpy : 1.21.0 pytz : 2021.1 dateutil : 2.8.1 pip : 19.2.3 setuptools : 41.2.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:33 (33 by maintainers)

github_iconTop GitHub Comments

2reactions
rhshadrachcommented, Jul 18, 2021

+1 on defaulting numeric_only to False. To have code

df[["a", "b"]].groupby("a").sum()

not result in summing b is surprising.

1reaction
jbrockmendelcommented, Aug 31, 2021

I’m +1 on removing the warning for 1.3.x and only having a FutureWarning in 1.4 for the pending behavior change. @jbrockmendel - any thoughts here?

I’m fine with that.

Read more comments on GitHub >

github_iconTop Results From Across the Web

BUG: Regression from 1.2.5 to 1.3.x: groupby using sum on ...
I have confirmed this bug exists on the latest version of pandas. ... BUG: DataFrame.groupby drops timedelta column in v1.3.0 #42395.
Read more >
What's new in 1.3.0 (July 2, 2021) - Pandas
These are bug fixes that might have notable behavior changes. ... The group-by column will now be dropped from the result of a...
Read more >
pandas groupby dropping columns - Stack Overflow
DataFrame ({'C': {0: -0.91985400000000006, 1: -0.042379, 2: 1.2476419999999999, 3: -0.00992, 4: 0.290213, 5: 0.49576700000000001, 6: 0.36294899999999997, ...
Read more >
What's New — pandas 0.23.4 documentation
从2019年1月1日开始,pandas功能版本仅支持Python 3。 See Plan for dropping Python 2.7 for more. What's new in v0.23.4. Fixed Regressions; Bug Fixes ...
Read more >
Spark Release 3.3.0
Apache Spark 3.3.0 is the fourth release of the 3.x line. ... with the support of popular Pandas features such as datetime.timedelta and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found