question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unexpected results for diagonal entries when using generic callable in corr

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas as pd
from scipy.stats import pearsonr

df = pd.DataFrame({'A': [1,2,3], 'B': [2,5,6]})
print(df.corr(method=lambda x, y: pearsonr(x, y)[1]))

          A         B
A  1.000000  0.178912
B  0.178912  1.000000

Problem description

I want to use the method argument of corr to compute p-values. However, diagonal elements are set to 1. I would expect them to be 0. They are set to 1 here: https://github.com/pandas-dev/pandas/blob/cb00deb94500205fcb27a33cc1d0df79a9727f8b/pandas/core/frame.py#L7025-L7026

Although I can see that for a ‘normal’ correlation 1 is expected, this is not the case in my example. Hence, I would suggest to remove these two lines from frame.py.

Expected Output

          A         B
A  0.000000  0.178912
B  0.178912  0.000000

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.165-81-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 4.1.1
pip: 18.1
setuptools: 40.6.3
Cython: 0.29.3
numpy: 1.15.4
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.3
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.9
feather: None
matplotlib: 3.0.2
openpyxl: 2.4.0-b1
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: 4.3.0
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:14 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
shadiakiki1986commented, Mar 14, 2019

As @fabianrost84 indicated earlier, the 1 in the diagonal is indeed hard-coded and not returned by the generic callable passed to the .corr function. The below callable would still generate 1’s along the diagonal

import pandas as pd
import numpy as np
return_zero = lambda a, b: 0
df = pd.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)], columns=['dogs', 'cats'])
df.corr(method=return_zero)

It would be convenient for this particular p-value use case to have the callable calculate the diagonals too, but it opens the door to other changes such as the resultant matrix from corr not having to be symmetric.

For the p-value issue, simply subtracting the diagonal would do, e.g. df.corr(method=...) - np.eye(len(df.columns)). I’m all for documenting the behavior and keeping the implementation as is.

0reactions
fbnrstcommented, Mar 14, 2019

#25729 is already merged, so I’ll open a new PR.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pyomo Documentation - Read the Docs
This section provides an introduction to Pyomo: Python Optimization Modeling Objects. A more complete description.
Read more >
a Finite Element package based on the Julia JIT compiler - arXiv
Julia sweeps the array entries in row-major order to represent multi-dimensional ... We avoid code duplication by using generic code and.
Read more >
Ada 95 Quality and Style: Guidelines for Professional ...
The preexisting AQ&S 83 presented a set of guidelines to help the programmer make disciplined use of Ada's features. In 1992, the Consortium...
Read more >
MathOptInterface.pdf - JuMP
The JuMP core contributors request that you do not use "JuMP" in the ... to off-diagonal entries are twice the value of the...
Read more >
R.attr | Android Developers
Custom divider drawable to use for elements in the action bar. ... that is invisible, a RuntimeException will result when the reference is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found