SeriesGroupBy.first / last loses categorical dtype
See original GitHub issueOn 1.0.3 and master:
import pandas as pd
df = pd.DataFrame({"a": [1, 2, 3]})
df["b"] = df["a"].astype("category")
print(df.groupby("a")["b"].first())
print(df.groupby("a")["b"].last())
gives
a
1 1
2 2
3 3
Name: b, dtype: int64
a
1 1
2 2
3 3
Name: b, dtype: int64
but the dtype should still be categorical and not int64. This seemingly wrong output is explicitly tested for here: https://github.com/pandas-dev/pandas/blob/master/pandas/tests/groupby/aggregate/test_aggregate.py#L461
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:5 (4 by maintainers)
Top Results From Across the Web
Categorical data — pandas 1.5.2 documentation
Categorical data has a specific category dtype: ... in the Series , but if the first position was sorted last, the renamed value...
Read more >Pandas groupby with categories with redundant nan
I would prefer a solution that is "pandas-compliant" to minimise / avoid loss of other pandas categorical functionality. A response from pandas ...
Read more >What's new in 0.25.0 (July 18, 2019) - Joris Van den Bossche
When performing Index.union() operations between objects of incompatible dtypes, the result will be a base Index of dtype object . This behavior holds...
Read more >pandas tutorial 2 - 馒头and花卷 - 博客园
SeriesGroupBy object at 0x00000162D2DCF048> ... Information column is Categorical-type and takes on a value of "left_only" for observations ...
Read more >v0.25.0 版本特性(2019年7月18日) - Pandas 中文
在将dict传递给Series groupby聚合(重命名时使用字典时不推荐 ... In [5]: df.groupby('payload').first().col.dtype Out[5]: dtype('O').
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Not sure if I should create a new issue or not, but I also wanted to point out some inconsistencies regarding this bug. When using
agg()
, the output dtype changes depending on whether we use a dictionary notation or not (see below):gives
Looks like this is fixed on master. could use bisection/test