BUG: Inconsistent behaviour when averaging Decimals, floats and ints
See original GitHub issue-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
from decimal import Decimal
df = pd.DataFrame({'col_1': [Decimal(1.5), Decimal(4.0)], 'col_2': [5.0, 10.0]})
df.mean(axis=1) # returns 5, 10 -- ignoring the decimals types in the averaging
df2 = pd.DataFrame({'col_1': [Decimal(1.5), Decimal(4.0)], 'col_2': [5, 10]})
df2.mean(axis=1) # returns 3.25, 7 -- includes the decimals types in the averaging
Problem description
There is inconsistent behaviour on how Decimal is being averaging depending if it is averaged to an int vs a float. Is it expected that the two dataframes above return different results?
Expected Output
I would expect in both cases to see 3.25
and 7
as the mean of the rows.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None python : 3.6.6.final.0 python-bits : 64 OS : Darwin OS-release : 18.7.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.0.3 numpy : 1.18.3 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 46.1.3 Cython : None pytest : 5.4.1 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 1.2.8 lxml.etree : None html5lib : None pymysql : None psycopg2 : 2.8.5 (dt dec pq3 ext lo64) jinja2 : 2.11.2 IPython : 7.13.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.2.1 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : 5.4.1 pyxlsb : None s3fs : None scipy : None sqlalchemy : 1.3.16 tables : None tabulate : None xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : 1.2.8 numba : None
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (4 by maintainers)
Top GitHub Comments
@cuchoi I can close. Thanks for the report nonetheless, it’s an interesting edge case
Yes, that’s expected / consistent with the above. It’s trying to do the averaging across all rows and if that fails falls back on only the “numeric” columns.