Unexpected results for diagonal entries when using generic callable in corr
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas as pd
from scipy.stats import pearsonr
df = pd.DataFrame({'A': [1,2,3], 'B': [2,5,6]})
print(df.corr(method=lambda x, y: pearsonr(x, y)[1]))
A B
A 1.000000 0.178912
B 0.178912 1.000000
Problem description
I want to use the method argument of corr
to compute p-values. However, diagonal elements are set to 1
. I would expect them to be 0
. They are set to 1
here: https://github.com/pandas-dev/pandas/blob/cb00deb94500205fcb27a33cc1d0df79a9727f8b/pandas/core/frame.py#L7025-L7026
Although I can see that for a ‘normal’ correlation 1
is expected, this is not the case in my example. Hence, I would suggest to remove these two lines from frame.py
.
Expected Output
A B
A 0.000000 0.178912
B 0.178912 0.000000
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.165-81-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: 4.1.1
pip: 18.1
setuptools: 40.6.3
Cython: 0.29.3
numpy: 1.15.4
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.3
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.9
feather: None
matplotlib: 3.0.2
openpyxl: 2.4.0-b1
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: 4.3.0
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
Issue Analytics
- State:
- Created 5 years ago
- Comments:14 (14 by maintainers)
Top Results From Across the Web
Pyomo Documentation - Read the Docs
This section provides an introduction to Pyomo: Python Optimization Modeling Objects. A more complete description.
Read more >a Finite Element package based on the Julia JIT compiler - arXiv
Julia sweeps the array entries in row-major order to represent multi-dimensional ... We avoid code duplication by using generic code and.
Read more >Ada 95 Quality and Style: Guidelines for Professional ...
The preexisting AQ&S 83 presented a set of guidelines to help the programmer make disciplined use of Ada's features. In 1992, the Consortium...
Read more >MathOptInterface.pdf - JuMP
The JuMP core contributors request that you do not use "JuMP" in the ... to off-diagonal entries are twice the value of the...
Read more >R.attr | Android Developers
Custom divider drawable to use for elements in the action bar. ... that is invisible, a RuntimeException will result when the reference is...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
As @fabianrost84 indicated earlier, the 1 in the diagonal is indeed hard-coded and not returned by the generic callable passed to the
.corr
function. The below callable would still generate 1’s along the diagonalIt would be convenient for this particular p-value use case to have the callable calculate the diagonals too, but it opens the door to other changes such as the resultant matrix from corr not having to be symmetric.
For the p-value issue, simply subtracting the diagonal would do, e.g.
df.corr(method=...) - np.eye(len(df.columns))
. I’m all for documenting the behavior and keeping the implementation as is.#25729 is already merged, so I’ll open a new PR.