Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: stats: implementation of `multivariate_normal.logcdf` suitable for small probability mass

See original GitHub issue

Hello there!

I am trying to use the Multivariate Normal Integration algorithm that is embedded in scipy, in the mvn package. Specifically, I am trying to use mvn.mvnun function, for very large number of variables (n>1000).

The function seems to return 0.0 with INFORM value 0, for this very large covariance matrices. I suspect that there might be an issue of numerical accuracy in the simulations, that renders the result 0 although it is probably a very small number.

IF THIS IS THE CASE (I am not an expert in this field), is there a way to perform the simulations for the logarithm of the integral, so the result doesn’t degenerate to 0?

Thanks for your help!

Here is a code of the MVN integral I am trying to compute, that returns a 0 value. It represents a typical points-in-a-grid in a spatial analysis where the covariance matrix is obtained by use of an isotropic radial kernel:

import numpy as np
from scipy.stats import mvn 
import time

def correl( dx, s, w ):
    return s * np.exp( -w*dx**2 )

def buildCov( nrows, ncols, s, w ):
    N = nrows * ncols
    S = np.zeros( [N, N] )
    for i in range( N ):
        for j in range( i,N ):
            rowi = np.int( i/ncols )
            coli = i - ncols * rowi + 1
            rowj = np.int( j/ncols )
            colj = j - ncols * rowj + 1
            ijdist = np.sqrt( (rowi - rowj)**2 + (coli - colj)**2 )
            S[i,j] = correl( ijdist, s, w )
            S[j,i] = S[i,j]
    return S

# Matrix of covariates
nrows = 25
ncols = 25
N = nrows*ncols

s = 1
w = ( 1/(min(nrows,ncols)/3) )**2
S = buildCov( nrows, ncols, s, w )
M = np.zeros( N )
inf = -np.inf * np.ones( N )
sup = 0 * np.ones( N )

t0 = time.time()
p,i = mvn.mvnun( inf, sup, M, S )
t2 = time.time() - t0

Issue Analytics

State:
Created 2 years ago
Comments:8 (6 by maintainers)

Top GitHub Comments

1reaction

rkerncommented, May 17, 2022

And I am not aware of any implementation of the log-CDF that would help here. Might try to search through the papers citing the original, or if you find an implementation in some other system (R, Julia, MATLAB, whatever). I think it would be hard, though.

It looks like there are approaches that exploit the banded structure of your covariance matrix, but they won’t be general for use in multivariate_normal.

1reaction

rkerncommented, May 17, 2022

The underlying algorithm only supports dimensionality up to 500. I think the implementation is supposed to explicitly return early with INFORM=2 to indicate that, but nonetheless, the algorithm won’t scale up that high. It does look like there have been some bugs fixed in the upstream package (MVNDST here) since we integrated it that might be relevant to your n=25 example.

In such high dimensions, I expect that for most practical purposes, the answer is ~0 if the mean coordinate is within the bounds and ~1 outside of it without a whole lot that’s practically reachable in between. High dimensions are weird.

Top Results From Across the Web

CHAPTER 1: Introduction to SciPy - Jupyter Notebooks Gallery

In this chapter we'll learn the benefits of using the combination of Python, NumPy, SciPy, and Matplotlib as a programming environment for any...

SciPy 1.7.0 Release Notes — SciPy v1.9.3 Manual

An implementation of the inverse of the Log CDF of the Normal Distribution is now available via scipy.special.ndtri_exp . scipy.stats improvements#. Hypothesis ...

SciPy: doc/release/1.7.0-notes.rst - Fossies

The Zipfian probability distribution has been implemented as scipy.stats.zipfian. The new distributions nchypergeom_fisher and nchypergeom_wallenius ...

7.2 - Probability Mass Functions | STAT 414

Example 7-6. Determine the constant so that the following p.m.f. of the random variable is a valid probability mass function: f ( y...

SciPy 1.0.0 Release Notes

The methods cdf and logcdf were added to scipy.stats.multivariate_normal , providing the cumulative distribution function of the multivariate normal ...