Cannot enforce zero correlation in samples
See original GitHub issueReproducing code example:
import numpy as np
np.random.seed(11)
n=100
mean = [0, 0]
cov = [[1, 0], [0, 1]] # Should have zero covariance
x, y = np.random.multivariate_normal(mean=mean, cov=cov, size=n).T
np.corrcoef(x,y)
yields:
array([[ 1., -0.02736689],
[-0.02736689, 1.]])
While I’d expect to see :
array([[ 1., 0],
[0, 1.]])
or some irrelevant numbers like 1e-16
. This can be reproduced for any seed.
This doesn’t cause any problem in R
for example :
library(MASS)
library(purrr)
# Means
m1 <- 5
m2 <- 10
# variances
s1 <- 5
s2 <- 1
# Correlations
cov <- 0
set.seed(11)
dat <- mvrnorm(20, mu = c(m1, m2),
Sigma = matrix(c(s1,cov,
cov, s2),
ncol = 2, byrow = TRUE),
empirical = TRUE)
dat %>% cor()
Where the correlation matrix behaves as I’d expect.
Numpy/Python version information:
1.18.3 3.8.2 | packaged by conda-forge | (default, Apr 24 2020, 08:20:52) [GCC 7.3.0]
Issue Analytics
- State:
- Created 3 years ago
- Comments:15 (9 by maintainers)
Top Results From Across the Web
Small n correlations cannot be trusted - basic statistics
Small n correlations cannot be trusted ... This post illustrates two important effects of sample size on the estimation of correlation ...
Read more >Everything you need to know about interpreting correlations
If the test shows that the population correlation coefficient ρ is close to zero, then we say there is insufficient statistical evidence that ......
Read more >Correlation Coefficient zero value==No relationship between ...
Your browser can't play this video. Learn more. Switch camera.
Read more >Why zero correlation does not necessarily imply independence
Correlation measures linear association between two given variables and it has no obligation to detect any other form of association else.
Read more >Pearson Correlation - an overview | ScienceDirect Topics
The Pearson correlation method is the most common method to use for numerical variables; it assigns a value between − 1 and 1,...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
You want
vh.T
. Oursvd()
routine follows the convention where the Hermitian of theV
matrix gets returned (which is why we name itvh
in the docstring), whereas yourR
code is using the straightV
matrix.I’d ask it again, and link to this issue.
I recommend you phrase it in terms of transforming the samples to have exactly the requested statistics, and adding an
empirical
argument to match R.It seems like this would be a reasonable request for many distributions, not just mvr. Even a standard normal could have this behavior.