Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot enforce zero correlation in samples

See original GitHub issue

Reproducing code example:


import numpy as np
np.random.seed(11)
n=100
mean = [0, 0]
cov = [[1, 0], [0, 1]]      # Should have zero covariance

x, y = np.random.multivariate_normal(mean=mean, cov=cov, size=n).T

np.corrcoef(x,y)

yields:

array([[ 1., -0.02736689],
       [-0.02736689,  1.]])

While I’d expect to see :

array([[ 1., 0],
       [0,  1.]])

or some irrelevant numbers like 1e-16. This can be reproduced for any seed.

This doesn’t cause any problem in R for example :

library(MASS)
library(purrr)

# Means
m1 <- 5
m2 <- 10
# variances
s1 <- 5
s2 <- 1
# Correlations
cov <- 0

set.seed(11)

dat <- mvrnorm(20, mu = c(m1, m2),
                     Sigma = matrix(c(s1,cov,
                                      cov, s2),
                                    ncol = 2, byrow = TRUE),
                     empirical = TRUE)

dat %>% cor()

Where the correlation matrix behaves as I’d expect.

Numpy/Python version information:

1.18.3 3.8.2 | packaged by conda-forge | (default, Apr 24 2020, 08:20:52) [GCC 7.3.0]

Issue Analytics

State:
Created 3 years ago
Comments:15 (9 by maintainers)

Top GitHub Comments

1reaction

rkerncommented, May 8, 2020

You want vh.T. Our svd() routine follows the convention where the Hermitian of the V matrix gets returned (which is why we name it vh in the docstring), whereas your R code is using the straight V matrix.

1reaction

eric-wiesercommented, May 8, 2020

I’d ask it again, and link to this issue.

I recommend you phrase it in terms of transforming the samples to have exactly the requested statistics, and adding an empirical argument to match R.

It seems like this would be a reasonable request for many distributions, not just mvr. Even a standard normal could have this behavior.