question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot enforce zero correlation in samples

See original GitHub issue

Reproducing code example:


import numpy as np
np.random.seed(11)
n=100
mean = [0, 0]
cov = [[1, 0], [0, 1]]      # Should have zero covariance

x, y = np.random.multivariate_normal(mean=mean, cov=cov, size=n).T

np.corrcoef(x,y)

yields:

array([[ 1., -0.02736689],
       [-0.02736689,  1.]])  

While I’d expect to see :

array([[ 1., 0],
       [0,  1.]])  

or some irrelevant numbers like 1e-16. This can be reproduced for any seed.

This doesn’t cause any problem in R for example :

library(MASS)
library(purrr)

# Means
m1 <- 5
m2 <- 10
# variances
s1 <- 5
s2 <- 1
# Correlations
cov <- 0

set.seed(11)

dat <- mvrnorm(20, mu = c(m1, m2),
                     Sigma = matrix(c(s1,cov,
                                      cov, s2),
                                    ncol = 2, byrow = TRUE),
                     empirical = TRUE)

dat %>% cor()

Where the correlation matrix behaves as I’d expect.

Numpy/Python version information:

1.18.3 3.8.2 | packaged by conda-forge | (default, Apr 24 2020, 08:20:52) [GCC 7.3.0]

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:15 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
rkerncommented, May 8, 2020

You want vh.T. Our svd() routine follows the convention where the Hermitian of the V matrix gets returned (which is why we name it vh in the docstring), whereas your R code is using the straight V matrix.

1reaction
eric-wiesercommented, May 8, 2020

I’d ask it again, and link to this issue.

I recommend you phrase it in terms of transforming the samples to have exactly the requested statistics, and adding an empirical argument to match R.

It seems like this would be a reasonable request for many distributions, not just mvr. Even a standard normal could have this behavior.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Small n correlations cannot be trusted - basic statistics
Small n correlations cannot be trusted ... This post illustrates two important effects of sample size on the estimation of correlation ...
Read more >
Everything you need to know about interpreting correlations
If the test shows that the population correlation coefficient ρ is close to zero, then we say there is insufficient statistical evidence that ......
Read more >
Correlation Coefficient zero value==No relationship between ...
Your browser can't play this video. Learn more. Switch camera.
Read more >
Why zero correlation does not necessarily imply independence
Correlation measures linear association between two given variables and it has no obligation to detect any other form of association else.
Read more >
Pearson Correlation - an overview | ScienceDirect Topics
The Pearson correlation method is the most common method to use for numerical variables; it assigns a value between − 1 and 1,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found