question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataFrame.GroupBy.apply has unexpected returns in some cases

See original GitHub issue

Code Sample, a copy-pastable example if possible

# Your code here
def agg(a, b):
    return a + b

x = pd.DataFrame({'A': np.arange(10), 'B': [1] * 10, 'C': np.random.rand(10), 'D': np.random.rand(10)}).set_index(['A', 'B'])

x.groupby('B').apply(lambda g: g.C + g.D)

Problem description

This returns a DataFrame of shape (1, 10)

This seems to occur when i) The groupby key happens to have a unique value ii) The apply function takes a DataFrame and returns a Series.

Expected Output

I expect it returns a Series of shape (10, 1)

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit : None python : 3.7.4.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-72-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 0.25.1 numpy : 1.17.2 pytz : 2019.3 dateutil : 2.8.0 pip : 19.2.3 setuptools : 41.4.0 Cython : 0.29.13 pytest : 5.2.1 hypothesis : None sphinx : 2.2.0 blosc : None feather : None xlsxwriter : 1.2.1 lxml.etree : 4.4.1 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.10.3 IPython : 7.8.0 pandas_datareader: 0.8.1 bs4 : 4.8.0 bottleneck : 1.2.1 fastparquet : None gcsfs : None lxml.etree : 4.4.1 matplotlib : 3.1.1 numexpr : 2.7.0 odfpy : None openpyxl : 3.0.0 pandas_gbq : None pyarrow : 0.13.0 pytables : None s3fs : None scipy : 1.3.1 sqlalchemy : 1.3.9 tables : 3.5.2 xarray : None xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.2.1

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:3
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
dsaxtoncommented, Jan 19, 2020

It looks like there’s a check for this type of case here https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/generic.py#L1268 that sends the code down separate paths depending on the number of unique values. In the case where we have one the result is explicitly unstacked here https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/generic.py#L1306.

I’m not sure if this is by design with some other situation in mind but I’d agree the shape of the output shouldn’t depend on the cardinality of the thing you’re grouping on.

1reaction
krlngcommented, Jan 6, 2021

My bad workaround right now is a simple loop - but I really don’t like it, its less pythonic and much slower in case of many groups. How ever, modifying the internals of pandas is not an option to me. In my point of view, this makes apply simply not usable for productive data-pipelines.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas groupby apply strange behavior when NaN's in group ...
Grouping keys are not contained in the result Series . What makes pandas behave differently when it produces the index of combined object?...
Read more >
Group by: split-apply-combine — pandas 1.1.5 documentation
Aggregation functions will not return the groups that you are aggregating over if they are named columns, when as_index=True , the default. The...
Read more >
How and why to stop using pandas .apply() (so much)
You have a python function and a pandas object, and you want to ... GroupBy.apply is a special case, but I would argue...
Read more >
A complete guide on Pandas Grouping, Aggregating, and ...
How to use pandas groupby: A basic example ... to be immutable and applying transformation functions over them can yield unexpected results.
Read more >
Pandas groupby() Explained With Examples
Group by operation involves splitting the data, applying some functions, ... several params that are explained below and returns DataFrameGroupBy object ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found