question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pandas groupby/aggregation operation changes 'key' column dtype

See original GitHub issue

reference: https://stackoverflow.com/questions/34059704/pandas-groupby-aggregation-operation-changes-key-column-dtype When I do group and aggregation operation to the original data frame. The dtype of the ‘key’ is changed from int32 to int64. Here is a simple example:

df = pd.DataFrame({'id': np.array([3234, 332635, 325993]), 'amount': np.array([34, 43, 32])},
                  index=['a', 'b', 'c'],
                  dtype='int32')
print df.info()

df = df.groupby('id', as_index=False).sum()
print df.info()

output is:

  <class 'pandas.core.frame.DataFrame'>
  Index: 3 entries, a to c
  Data columns (total 2 columns):
  amount    3 non-null int32
  id        3 non-null int32
  dtypes: int32(2)
  memory usage: 48.0+ bytes
  None
  
  <class 'pandas.core.frame.DataFrame'>
  Int64Index: 3 entries, 0 to 2
  Data columns (total 2 columns):
  id        3 non-null int64
  amount    3 non-null int32
  dtypes: int32(1), int64(1)
  memory usage: 60.0 bytes
  None

pd.show_versions():

INSTALLED VERSIONS

commit: None python: 2.7.13.final.0 python-bits: 64 OS: Darwin OS-release: 14.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.20.3 pytest: 2.8.5 pip: 9.0.1 setuptools: 20.3 Cython: 0.23.4 numpy: 1.13.1 scipy: 0.19.1 xarray: None IPython: 4.1.2 sphinx: 1.3.5 patsy: 0.4.0 dateutil: 2.5.1 pytz: 2016.2 blosc: None bottleneck: 1.0.0 tables: 3.2.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.3.2 xlrd: 0.9.4 xlwt: 1.0.0 xlsxwriter: 0.8.4 lxml: 3.6.0 bs4: 4.4.1 html5lib: None sqlalchemy: 1.0.12 pymysql: None psycopg2: None jinja2: 2.8 s3fs: None pandas_gbq: None pandas_datareader: None None

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
daragallaghercommented, May 11, 2020

I know this is an old and closed issue but I was bitten by this issue recently.

And I found myself confused by the above comment (“we don’t allow non int64 indexers in a result index”):

df = pd.DataFrame({
              'a': np.array([1], dtype=np.int32), 
              'b': np.array([1.23], dtype=np.float64), 
              'c': np.array([3], dtype=np.uint8), 
              'd': np.array([7.8], dtype=np.float16), 
              'e': np.array([False], dtype=np.bool)})

I can get a bunch of indexers:

In [4]: df.groupby('a').sum().index                                                                                                                 
Out[4]: Int64Index([1], dtype='int64', name='a')

In [5]: df.groupby('b').sum().index                                                                                                                 
Out[5]: Float64Index([1.23], dtype='float64', name='b')

In [6]: df.groupby('c').sum().index                                                                                                                 
Out[6]: UInt64Index([3], dtype='uint64', name='c')

In [7]: df.groupby('d').sum().index                                                                                                                 
Out[7]: Float64Index([7.80078125], dtype='float64', name='d')

In [8]: df.groupby('e').sum().index                                                                                                                 
Out[8]: Index([False], dtype='object', name='e')

I’m not sure how this behaviour should be expected by users as there’s no mention of this behaviour in the groupby documentation.

0reactions
rjafaraucommented, May 14, 2021

Agreed with @daragallagher. It’s not obvious behavior from groupby documentation! @jreback

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas groupby/aggregation operation changes 'key' column ...
When I do group and aggregation operation to the original data frame. The dtype of the 'key' is changed from int32 to int64, ......
Read more >
pandas.core.groupby.DataFrameGroupBy.aggregate
Aggregate using one or more operations over the specified axis. ... For 'numba' engine, the engine can accept nopython , nogil and parallel...
Read more >
Group and Aggregate your Data Better using Pandas Groupby
Aggregation and grouping of Dataframes is accomplished in Python Pandas using “groupby()” and “agg()” functions. Apply max, min, count, distinct to groups.
Read more >
All Pandas groupby() You Should Know for Grouping Data ...
In SQL, the GROUP BY statement groups row that has the same category values into summary rows. In Pandas, SQL's GROUP BY operation...
Read more >
Pandas groupby() Explained With Examples
Similar to the SQL GROUP BY clause pandas DataFrame.groupby() function is used to collect identical data into groups and perform aggregate functions on...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found