pandas groupby/aggregation operation changes 'key' column dtype
See original GitHub issuereference: https://stackoverflow.com/questions/34059704/pandas-groupby-aggregation-operation-changes-key-column-dtype When I do group and aggregation operation to the original data frame. The dtype of the ‘key’ is changed from int32 to int64. Here is a simple example:
df = pd.DataFrame({'id': np.array([3234, 332635, 325993]), 'amount': np.array([34, 43, 32])},
index=['a', 'b', 'c'],
dtype='int32')
print df.info()
df = df.groupby('id', as_index=False).sum()
print df.info()
output is:
<class 'pandas.core.frame.DataFrame'> Index: 3 entries, a to c Data columns (total 2 columns): amount 3 non-null int32 id 3 non-null int32 dtypes: int32(2) memory usage: 48.0+ bytes None <class 'pandas.core.frame.DataFrame'> Int64Index: 3 entries, 0 to 2 Data columns (total 2 columns): id 3 non-null int64 amount 3 non-null int32 dtypes: int32(1), int64(1) memory usage: 60.0 bytes None
pd.show_versions():
INSTALLED VERSIONS
commit: None python: 2.7.13.final.0 python-bits: 64 OS: Darwin OS-release: 14.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: None.None
pandas: 0.20.3 pytest: 2.8.5 pip: 9.0.1 setuptools: 20.3 Cython: 0.23.4 numpy: 1.13.1 scipy: 0.19.1 xarray: None IPython: 4.1.2 sphinx: 1.3.5 patsy: 0.4.0 dateutil: 2.5.1 pytz: 2016.2 blosc: None bottleneck: 1.0.0 tables: 3.2.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.3.2 xlrd: 0.9.4 xlwt: 1.0.0 xlsxwriter: 0.8.4 lxml: 3.6.0 bs4: 4.4.1 html5lib: None sqlalchemy: 1.0.12 pymysql: None psycopg2: None jinja2: 2.8 s3fs: None pandas_gbq: None pandas_datareader: None None
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
I know this is an old and closed issue but I was bitten by this issue recently.
And I found myself confused by the above comment (“we don’t allow non int64 indexers in a result index”):
I can get a bunch of indexers:
I’m not sure how this behaviour should be expected by users as there’s no mention of this behaviour in the groupby documentation.
Agreed with @daragallagher. It’s not obvious behavior from groupby documentation! @jreback