question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Behavior change of 1.4.rc0 when group by `apply` returns a copy of the passed data frame

See original GitHub issue

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

df = pd.DataFrame([[1,2,3,4]], columns=["A", "B", "C", "D"])

grouped = df.groupby(['A', 'B'], as_index= True)[["C", "D"]]

combined = grouped.apply(lambda x: x.copy())

combined.reset_index().columns
# Pandas 1.3: Index(['A', 'B', 'level_2', 'C', 'D'], dtype='object')
# Pandas 1.4: Index(['index', 'C', 'D'], dtype='object')

Issue Description

While testing our code base against 1.4rc0 I stumbled upon a behavior change of Pandas 1.4.

Formerly when apply was called on a grouper it seems as if a returned multi-index was honored in the resulting data frame.
With 1.4 the index is dropped.

No idea if this is an intentional change or not. Wanted to let you know to be on the safe side. also did not check why or code does what it does (the method called in apply is pretty complex, not just a copy). Looks a bit fishy…

Expected Behavior

Unchanged behavior.

Installed Versions

Replace this line with the output of pd.show_versions()

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
rhshadrachcommented, Jan 15, 2022

That sounds good @simonjayhawkins - I should be able to get a PR up by the end of today.

1reaction
rhshadrachcommented, Jan 14, 2022

@aberres - thanks for the report! The change here was not intentional, in that it was an behavior change that was not noticed until after it was merged and without a deprecation notice. However, it does move pandas in the right direction of consistency, and would be very hard to revert, so I am +1 on moving forward with the behavior in 1.4 as-is.

apply was meant to be flexible, handling aggregations, transforms, and everything in between. The result is different when apply detects an operation was a transform. Previously, in 1.3, pandas used two different definitions of whether something was a transform (i.e. whether the index was “mutated”) - one code path used “is” whereas another code path used “equals” when determining whether the input index is the same as the output index. The PR highlighted by @simonjayhawkins removed this inconsistency.

Being able to control what happens with the group keys in apply (regardless of whether pandas detects a transform) would be resolved by #34998.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Inconsistent output in GroupBy.apply returning a DataFrame
This issue is to verify that we actually want the behavior on master. ... So the 1.0.4 behavior is to always prepend the...
Read more >
python - pandas groupby-apply behavior, returning a Series ...
This returns a a Series object. However, if every series has the same length, then it pivots this into a DataFrame . In...
Read more >
pandas.core.groupby.GroupBy.apply
Apply a function to each row or column of a DataFrame. Notes. Changed in version 1.3.0: The resulting dtype will reflect the return...
Read more >
Pandas groupby() Explained With Examples
Similar to the SQL GROUP BY clause pandas DataFrame.groupby() function is used to collect identical data into groups and perform aggregate functions on....
Read more >
How and why to stop using pandas .apply() (so much)
DataFrame.apply (axis=0). Ok, let's make a very small change to our earlier code to observe the behaviour of apply on pd.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found