cast to float when using ``groupby.agg`` with function returning ``int`` on ``float`` input
See original GitHub issueCode Sample, a copy-pastable example if possible
In [2]: df = pd.DataFrame([[1], [2], [3.3]])
In [3]: df.groupby([1,1,1]).agg(len)
Out[3]:
0
1 3.0
Problem description
The result of len
should be int
, regardless of the input. This is not specific to len
: lambda x : 3
results in the same.
Expected Output
Compare to
In [4]: df.apply(len)
Out[4]:
0 3
dtype: int64
In [5]: df.groupby([1,1,1]).apply(len)
Out[5]:
1 3
dtype: int64
In [6]: df.astype(int).groupby([1,1,1]).agg(len)
Out[6]:
0
1 3
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: 9e7666dae3b3b10d987ce154a51c78bcee6e0728
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8
pandas: 0.21.0.dev+265.g9e7666dae pytest: 3.0.6 pip: 9.0.1 setuptools: None Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: None IPython: 5.1.0.dev sphinx: 1.5.6 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.3.0 numexpr: 2.6.1 feather: 0.3.1 matplotlib: 2.0.2 openpyxl: None xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.6 lxml: None bs4: 4.5.3 html5lib: 0.999999999 sqlalchemy: 1.0.15 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: 0.2.1
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Stop Pandas from converting int to float due to an insertion in ...
I prefer using int instead of float because the actual data in that column are integers. If there's not workaround, I'll just use...
Read more >Convert Floats to Integers in a Pandas DataFrame
Let us see how to convert float to integer in a Pandas DataFrame. We will be using the astype() method to do this....
Read more >Pandas .groupby(), Lambda Function, & Pivot Table Tutorial
This lesson of the Python Tutorial for Data Analysis covers grouping data with pandas .groupby(), using lambda functions and pivot tables, and sorting...
Read more >Aggregate Functions
Returns an integer value based on its parameters. It can be used to simplify a query that needs many GROUP BY levels by...
Read more >Pandas Convert Column to Int in DataFrame
Now by using the same approaches using astype() let's convert the float column to int (integer) type in pandas DataFrame. Note that while...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
My point was that the result of
len
is anint
, and should never become a float, so there should be no need to downcast (the fact that input data wasfloat
should be irrelevant). But I will probably just need to look at the code to understand what you mean.Edit: The following comment is not relevant to the issue here since the handling of a string argument differs from that of a callable.
I ran into a similar issue with .transform(‘nunique’):
The resulting series I get are floats using 0.25.3. They become integers if column b values are integers, or if I replace .transform(‘nunique’) with merely .nunique().