Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pandas groupby/aggregation operation changes 'key' column dtype

See original GitHub issue

reference: https://stackoverflow.com/questions/34059704/pandas-groupby-aggregation-operation-changes-key-column-dtype When I do group and aggregation operation to the original data frame. The dtype of the ‘key’ is changed from int32 to int64. Here is a simple example:

df = pd.DataFrame({'id': np.array([3234, 332635, 325993]), 'amount': np.array([34, 43, 32])},
                  index=['a', 'b', 'c'],
                  dtype='int32')
print df.info()

df = df.groupby('id', as_index=False).sum()
print df.info()

output is:

  <class 'pandas.core.frame.DataFrame'>
  Index: 3 entries, a to c
  Data columns (total 2 columns):
  amount    3 non-null int32
  id        3 non-null int32
  dtypes: int32(2)
  memory usage: 48.0+ bytes
  None
  
  <class 'pandas.core.frame.DataFrame'>
  Int64Index: 3 entries, 0 to 2
  Data columns (total 2 columns):
  id        3 non-null int64
  amount    3 non-null int32
  dtypes: int32(1), int64(1)
  memory usage: 60.0 bytes
  None

pd.show_versions():

INSTALLED VERSIONS

commit: None python: 2.7.13.final.0 python-bits: 64 OS: Darwin OS-release: 14.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.20.3 pytest: 2.8.5 pip: 9.0.1 setuptools: 20.3 Cython: 0.23.4 numpy: 1.13.1 scipy: 0.19.1 xarray: None IPython: 4.1.2 sphinx: 1.3.5 patsy: 0.4.0 dateutil: 2.5.1 pytz: 2016.2 blosc: None bottleneck: 1.0.0 tables: 3.2.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.3.2 xlrd: 0.9.4 xlwt: 1.0.0 xlsxwriter: 0.8.4 lxml: 3.6.0 bs4: 4.4.1 html5lib: None sqlalchemy: 1.0.12 pymysql: None psycopg2: None jinja2: 2.8 s3fs: None pandas_gbq: None pandas_datareader: None None

Issue Analytics

State:
Created 6 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

3reactions

daragallaghercommented, May 11, 2020

I know this is an old and closed issue but I was bitten by this issue recently.

And I found myself confused by the above comment (“we don’t allow non int64 indexers in a result index”):

df = pd.DataFrame({
              'a': np.array([1], dtype=np.int32), 
              'b': np.array([1.23], dtype=np.float64), 
              'c': np.array([3], dtype=np.uint8), 
              'd': np.array([7.8], dtype=np.float16), 
              'e': np.array([False], dtype=np.bool)})

I can get a bunch of indexers:

In [4]: df.groupby('a').sum().index                                                                                                                 
Out[4]: Int64Index([1], dtype='int64', name='a')

In [5]: df.groupby('b').sum().index                                                                                                                 
Out[5]: Float64Index([1.23], dtype='float64', name='b')

In [6]: df.groupby('c').sum().index                                                                                                                 
Out[6]: UInt64Index([3], dtype='uint64', name='c')

In [7]: df.groupby('d').sum().index                                                                                                                 
Out[7]: Float64Index([7.80078125], dtype='float64', name='d')

In [8]: df.groupby('e').sum().index                                                                                                                 
Out[8]: Index([False], dtype='object', name='e')

I’m not sure how this behaviour should be expected by users as there’s no mention of this behaviour in the groupby documentation.

0reactions

rjafaraucommented, May 14, 2021

Agreed with @daragallagher. It’s not obvious behavior from groupby documentation! @jreback

Top Results From Across the Web

pandas groupby/aggregation operation changes 'key' column ...

When I do group and aggregation operation to the original data frame. The dtype of the 'key' is changed from int32 to int64, ......

pandas.core.groupby.DataFrameGroupBy.aggregate

Aggregate using one or more operations over the specified axis. ... For 'numba' engine, the engine can accept nopython , nogil and parallel...

Group and Aggregate your Data Better using Pandas Groupby

Aggregation and grouping of Dataframes is accomplished in Python Pandas using “groupby()” and “agg()” functions. Apply max, min, count, distinct to groups.

All Pandas groupby() You Should Know for Grouping Data ...

In SQL, the GROUP BY statement groups row that has the same category values into summary rows. In Pandas, SQL's GROUP BY operation...

Pandas groupby() Explained With Examples

Similar to the SQL GROUP BY clause pandas DataFrame.groupby() function is used to collect identical data into groups and perform aggregate functions on...