Inconsistent output in GroupBy.apply returning a DataFrame
See original GitHub issueThis is a continuation of https://github.com/pandas-dev/pandas/issues/13056, https://github.com/pandas-dev/pandas/issues/14927, and https://github.com/pandas-dev/pandas/issues/13056, which were closed by https://github.com/pandas-dev/pandas/pull/31613. I think that PR ensured that we consistently take one of two code paths. This issue is to verify that we actually want the behavior on master.
Focusing on a specific pair of examples that differ only in whether the returned index is the same or not:
# master
In [10]: def f(x):
...: return x.copy() # same index
In [11]: def g(x):
...: return x.copy().rename(lambda x: x + 1) # different index
In [12]: df = pd.DataFrame({"A": ['a', 'b'], "B": [1, 2]})
In [13]: df.groupby("A").apply(f)
Out[13]:
A B
0 a 1
1 b 2
In [14]: df.groupby("A").apply(g)
Out[14]:
A B
A
a 1 a 1
b 2 b 2
# 1.0.4
In [8]: df.groupby("A").apply(f)
Out[8]:
A B
A
a 0 a 1
b 1 b 2
In [9]: df.groupby("A").apply(g)
Out[9]:
A B
A
a 1 a 1
b 2 b 2
So the 1.0.4 behavior is to always prepend the group keys to the result as an index level. In pandas 1.1.0, whether the group keys are prepended depends on whether the udf returns a dataframe with an identical index. Do we want that kind of value-dependent behavior?
@jorisvandenbossche’s notebook from https://github.com/pandas-dev/pandas/issues/13056#issuecomment-403300216 might be helpful, though it might be out of date.
Issue Analytics
- State:
- Created 3 years ago
- Comments:16 (12 by maintainers)
Top GitHub Comments
Thanks. It looks like this is essentially inconsistent handling of the existing
group_keys
argument, which https://github.com/pandas-dev/pandas/pull/34998 is trying to clean up. Hopefully this doesn’t make things too much more complicated for cudf.removing the milestone and blocker label