Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataFrame.GroupBy.apply has unexpected returns in some cases

See original GitHub issue

Code Sample, a copy-pastable example if possible

# Your code here
def agg(a, b):
    return a + b

x = pd.DataFrame({'A': np.arange(10), 'B': [1] * 10, 'C': np.random.rand(10), 'D': np.random.rand(10)}).set_index(['A', 'B'])

x.groupby('B').apply(lambda g: g.C + g.D)

Problem description

This returns a DataFrame of shape (1, 10)

This seems to occur when i) The groupby key happens to have a unique value ii) The apply function takes a DataFrame and returns a Series.

Expected Output

I expect it returns a Series of shape (10, 1)

Output of `pd.show_versions()`

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit : None python : 3.7.4.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-72-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 0.25.1 numpy : 1.17.2 pytz : 2019.3 dateutil : 2.8.0 pip : 19.2.3 setuptools : 41.4.0 Cython : 0.29.13 pytest : 5.2.1 hypothesis : None sphinx : 2.2.0 blosc : None feather : None xlsxwriter : 1.2.1 lxml.etree : 4.4.1 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.10.3 IPython : 7.8.0 pandas_datareader: 0.8.1 bs4 : 4.8.0 bottleneck : 1.2.1 fastparquet : None gcsfs : None lxml.etree : 4.4.1 matplotlib : 3.1.1 numexpr : 2.7.0 odfpy : None openpyxl : 3.0.0 pandas_gbq : None pyarrow : 0.13.0 pytables : None s3fs : None scipy : 1.3.1 sqlalchemy : 1.3.9 tables : 3.5.2 xarray : None xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.2.1

Issue Analytics

State:
Created 4 years ago
Reactions:3
Comments:6 (3 by maintainers)

Top GitHub Comments

2reactions

dsaxtoncommented, Jan 19, 2020

It looks like there’s a check for this type of case here https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/generic.py#L1268 that sends the code down separate paths depending on the number of unique values. In the case where we have one the result is explicitly unstacked here https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/generic.py#L1306.

I’m not sure if this is by design with some other situation in mind but I’d agree the shape of the output shouldn’t depend on the cardinality of the thing you’re grouping on.

1reaction

krlngcommented, Jan 6, 2021

My bad workaround right now is a simple loop - but I really don’t like it, its less pythonic and much slower in case of many groups. How ever, modifying the internals of pandas is not an option to me. In my point of view, this makes apply simply not usable for productive data-pipelines.

Top Results From Across the Web

Pandas groupby apply strange behavior when NaN's in group ...

Grouping keys are not contained in the result Series . What makes pandas behave differently when it produces the index of combined object?...

Group by: split-apply-combine — pandas 1.1.5 documentation

Aggregation functions will not return the groups that you are aggregating over if they are named columns, when as_index=True , the default. The...

How and why to stop using pandas .apply() (so much)

You have a python function and a pandas object, and you want to ... GroupBy.apply is a special case, but I would argue...

A complete guide on Pandas Grouping, Aggregating, and ...

How to use pandas groupby: A basic example ... to be immutable and applying transformation functions over them can yield unexpected results.

Pandas groupby() Explained With Examples

Group by operation involves splitting the data, applying some functions, ... several params that are explained below and returns DataFrameGroupBy object ...