Behavior change of 1.4.rc0 when group by `apply` returns a copy of the passed data frame
See original GitHub issuePandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
df = pd.DataFrame([[1,2,3,4]], columns=["A", "B", "C", "D"])
grouped = df.groupby(['A', 'B'], as_index= True)[["C", "D"]]
combined = grouped.apply(lambda x: x.copy())
combined.reset_index().columns
# Pandas 1.3: Index(['A', 'B', 'level_2', 'C', 'D'], dtype='object')
# Pandas 1.4: Index(['index', 'C', 'D'], dtype='object')
Issue Description
While testing our code base against 1.4rc0 I stumbled upon a behavior change of Pandas 1.4.
Formerly when apply
was called on a grouper it seems as if a returned multi-index was honored in the resulting data frame.
With 1.4 the index is dropped.
No idea if this is an intentional change or not. Wanted to let you know to be on the safe side. also did not check why or code does what it does (the method called in apply is pretty complex, not just a copy). Looks a bit fishy…
Expected Behavior
Unchanged behavior.
Installed Versions
Replace this line with the output of pd.show_versions()
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (8 by maintainers)
Top Results From Across the Web
Inconsistent output in GroupBy.apply returning a DataFrame
This issue is to verify that we actually want the behavior on master. ... So the 1.0.4 behavior is to always prepend the...
Read more >python - pandas groupby-apply behavior, returning a Series ...
This returns a a Series object. However, if every series has the same length, then it pivots this into a DataFrame . In...
Read more >pandas.core.groupby.GroupBy.apply
Apply a function to each row or column of a DataFrame. Notes. Changed in version 1.3.0: The resulting dtype will reflect the return...
Read more >Pandas groupby() Explained With Examples
Similar to the SQL GROUP BY clause pandas DataFrame.groupby() function is used to collect identical data into groups and perform aggregate functions on....
Read more >How and why to stop using pandas .apply() (so much)
DataFrame.apply (axis=0). Ok, let's make a very small change to our earlier code to observe the behaviour of apply on pd.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
That sounds good @simonjayhawkins - I should be able to get a PR up by the end of today.
@aberres - thanks for the report! The change here was not intentional, in that it was an behavior change that was not noticed until after it was merged and without a deprecation notice. However, it does move pandas in the right direction of consistency, and would be very hard to revert, so I am +1 on moving forward with the behavior in 1.4 as-is.
apply
was meant to be flexible, handling aggregations, transforms, and everything in between. The result is different when apply detects an operation was a transform. Previously, in 1.3, pandas used two different definitions of whether something was a transform (i.e. whether the index was “mutated”) - one code path used “is” whereas another code path used “equals” when determining whether the input index is the same as the output index. The PR highlighted by @simonjayhawkins removed this inconsistency.Being able to control what happens with the group keys in apply (regardless of whether pandas detects a transform) would be resolved by #34998.