Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Inconsistent behaviour when averaging Decimals, floats and ints

See original GitHub issue

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.

from decimal import Decimal

df = pd.DataFrame({'col_1': [Decimal(1.5), Decimal(4.0)], 'col_2': [5.0, 10.0]})
df.mean(axis=1) # returns 5, 10 -- ignoring the decimals types in the averaging

df2 = pd.DataFrame({'col_1': [Decimal(1.5), Decimal(4.0)], 'col_2': [5, 10]})
df2.mean(axis=1) # returns 3.25, 7 -- includes the decimals types in the averaging

Problem description

There is inconsistent behaviour on how Decimal is being averaging depending if it is averaged to an int vs a float. Is it expected that the two dataframes above return different results?

Expected Output

I would expect in both cases to see 3.25 and 7 as the mean of the rows.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None python : 3.6.6.final.0 python-bits : 64 OS : Darwin OS-release : 18.7.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.0.3 numpy : 1.18.3 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 46.1.3 Cython : None pytest : 5.4.1 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 1.2.8 lxml.etree : None html5lib : None pymysql : None psycopg2 : 2.8.5 (dt dec pq3 ext lo64) jinja2 : 2.11.2 IPython : 7.13.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.2.1 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : 5.4.1 pyxlsb : None s3fs : None scipy : None sqlalchemy : 1.3.16 tables : None tabulate : None xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : 1.2.8 numba : None

Issue Analytics

State:
Created 3 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

dsaxtoncommented, May 19, 2020

@cuchoi I can close. Thanks for the report nonetheless, it’s an interesting edge case

1reaction

dsaxtoncommented, May 16, 2020

Now the second row doesn’t include in the average the Decimal in the operation. Is this expected behaviour as well?

Yes, that’s expected / consistent with the above. It’s trying to do the averaging across all rows and if that fails falls back on only the “numeric” columns.