BUG: Spearman correlation is broken (dtype mismatch) on 32-bit platforms
See original GitHub issue-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
import pandas as pd
d = DataFrame([1.0, 2.0])
d.corr(method='spearman')
Issue Description
Calling the corr
method of a DataFrame
with method='spearman'
produces a ValueError due to a buffer dtype mismatch on 32-bit platforms.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.10/site-packages/pandas/core/frame.py", line 9376, in corr
correl = libalgos.nancorr_spearman(mat, minp=min_periods)
File "pandas/_libs/algos.pyx", line 415, in pandas._libs.algos.nancorr_spearman
File "pandas/_libs/algos.pyx", line 938, in pandas._libs.algos.rank_1d
ValueError: Buffer dtype mismatch, expected 'const intp_t' but got 'long long'
If I have some time, I’ll look into this further and try to offer a PR. The problem was discovered due to a failing test in pingouin
(https://github.com/raphaelvallat/pingouin/issues/197).
I have reproduced this on both 32-bit x86 and 32-bit ARM. While my “installed versions” are those currently in Fedora Rawhide, including Pandas 1.3.0, I did build an RPM for Pandas 1.3.3 and reproduce with that too.
Expected Behavior
0
0 1.0
Installed Versions
INSTALLED VERSIONS
commit : f00ed8f47020034e752baf0250483053340971b0 python : 3.10.0.candidate.2 python-bits : 32 OS : Linux OS-release : 5.13.14-200.fc34.x86_64 Version : #1 SMP Fri Sep 3 15:33:01 UTC 2021 machine : armv7l processor : armv7l byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.3.0 numpy : 1.21.1 pytz : 2021.1 dateutil : 2.8.1 pip : None setuptools : 57.4.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.6.3 html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : 1.3.2 fsspec : None fastparquet : None gcsfs : None matplotlib : 3.5.0b1 numexpr : 2.7.1 odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.7.0 sqlalchemy : None tables : 3.6.1 tabulate : None xarray : None xlrd : None xlwt : None numba : None
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (7 by maintainers)
Thanks! I appreciate your efforts.
Once a fix is available, I’ll work with the maintainers of the
python-pandas
package in Fedora Linux to try to make sure it is part of the upcoming Fedora 35 release. The current Fedora 34 release has pandas 1.2.5, which (I’ve verified) predates the regression.Current
master
(5872bfe1c713ddb27b337e5b6549bf497e44834b) works as expected on 32-bit x86.