Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SeriesGroupBy.first / last loses categorical dtype

See original GitHub issue

On 1.0.3 and master:

import pandas as pd

df = pd.DataFrame({"a": [1, 2, 3]})
df["b"] = df["a"].astype("category")

print(df.groupby("a")["b"].first())
print(df.groupby("a")["b"].last())

gives

a
1    1
2    2
3    3
Name: b, dtype: int64
a
1    1
2    2
3    3
Name: b, dtype: int64

but the dtype should still be categorical and not int64. This seemingly wrong output is explicitly tested for here: https://github.com/pandas-dev/pandas/blob/master/pandas/tests/groupby/aggregate/test_aggregate.py#L461

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

Puumanamanacommented, Jun 1, 2020

Not sure if I should create a new issue or not, but I also wanted to point out some inconsistencies regarding this bug. When using agg(), the output dtype changes depending on whether we use a dictionary notation or not (see below):

import pandas as pd # v1.0.4

# Same example data as above
df = pd.DataFrame({"a": [1, 2, 3]})
df["b"] = df["a"].astype("category")

print(df.groupby("a").agg("first").b) # Categorical dtype
print(df.groupby("a").agg({"b": "first"}).b) # int64 dtype

gives

a
1    1
2    2
3    3
Name: b, dtype: category
Categories (3, int64): [1, 2, 3]

a
1    1
2    2
3    3
Name: b, dtype: int64

0reactions

jbrockmendelcommented, May 24, 2021

Looks like this is fixed on master. could use bisection/test

Top Results From Across the Web

Categorical data — pandas 1.5.2 documentation

Categorical data has a specific category dtype: ... in the Series , but if the first position was sorted last, the renamed value...

Pandas groupby with categories with redundant nan

I would prefer a solution that is "pandas-compliant" to minimise / avoid loss of other pandas categorical functionality. A response from pandas ...

What's new in 0.25.0 (July 18, 2019) - Joris Van den Bossche

When performing Index.union() operations between objects of incompatible dtypes, the result will be a base Index of dtype object . This behavior holds...

pandas tutorial 2 - 馒头and花卷 - 博客园

SeriesGroupBy object at 0x00000162D2DCF048> ... Information column is Categorical-type and takes on a value of "left_only" for observations ...

v0.25.0 版本特性（2019年7月18日） - Pandas 中文

在将dict传递给Series groupby聚合（重命名时使用字典时不推荐 ... In [5]: df.groupby('payload').first().col.dtype Out[5]: dtype('O').