question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: stats: implementation of `multivariate_normal.logcdf` suitable for small probability mass

See original GitHub issue

Hello there!

I am trying to use the Multivariate Normal Integration algorithm that is embedded in scipy, in the mvn package. Specifically, I am trying to use mvn.mvnun function, for very large number of variables (n>1000).

The function seems to return 0.0 with INFORM value 0, for this very large covariance matrices. I suspect that there might be an issue of numerical accuracy in the simulations, that renders the result 0 although it is probably a very small number.

IF THIS IS THE CASE (I am not an expert in this field), is there a way to perform the simulations for the logarithm of the integral, so the result doesn’t degenerate to 0?

Thanks for your help!

Here is a code of the MVN integral I am trying to compute, that returns a 0 value. It represents a typical points-in-a-grid in a spatial analysis where the covariance matrix is obtained by use of an isotropic radial kernel:

import numpy as np
from scipy.stats import mvn 
import time

def correl( dx, s, w ):
    return s * np.exp( -w*dx**2 )

def buildCov( nrows, ncols, s, w ):
    N = nrows * ncols
    S = np.zeros( [N, N] )
    for i in range( N ):
        for j in range( i,N ):
            rowi = np.int( i/ncols )
            coli = i - ncols * rowi + 1
            rowj = np.int( j/ncols )
            colj = j - ncols * rowj + 1
            ijdist = np.sqrt( (rowi - rowj)**2 + (coli - colj)**2 )
            S[i,j] = correl( ijdist, s, w )
            S[j,i] = S[i,j]
    return S

# Matrix of covariates
nrows = 25
ncols = 25
N = nrows*ncols

s = 1
w = ( 1/(min(nrows,ncols)/3) )**2
S = buildCov( nrows, ncols, s, w )
M = np.zeros( N )
inf = -np.inf * np.ones( N )
sup = 0 * np.ones( N )

t0 = time.time()
p,i = mvn.mvnun( inf, sup, M, S )
t2 = time.time() - t0

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:8 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
rkerncommented, May 17, 2022

And I am not aware of any implementation of the log-CDF that would help here. Might try to search through the papers citing the original, or if you find an implementation in some other system (R, Julia, MATLAB, whatever). I think it would be hard, though.

It looks like there are approaches that exploit the banded structure of your covariance matrix, but they won’t be general for use in multivariate_normal.

1reaction
rkerncommented, May 17, 2022

The underlying algorithm only supports dimensionality up to 500. I think the implementation is supposed to explicitly return early with INFORM=2 to indicate that, but nonetheless, the algorithm won’t scale up that high. It does look like there have been some bugs fixed in the upstream package (MVNDST here) since we integrated it that might be relevant to your n=25 example.

In such high dimensions, I expect that for most practical purposes, the answer is ~0 if the mean coordinate is within the bounds and ~1 outside of it without a whole lot that’s practically reachable in between. High dimensions are weird.

Read more comments on GitHub >

github_iconTop Results From Across the Web

CHAPTER 1: Introduction to SciPy - Jupyter Notebooks Gallery
In this chapter we'll learn the benefits of using the combination of Python, NumPy, SciPy, and Matplotlib as a programming environment for any...
Read more >
SciPy 1.7.0 Release Notes — SciPy v1.9.3 Manual
An implementation of the inverse of the Log CDF of the Normal Distribution is now available via scipy.special.ndtri_exp . scipy.stats improvements#. Hypothesis ...
Read more >
SciPy: doc/release/1.7.0-notes.rst - Fossies
The Zipfian probability distribution has been implemented as scipy.stats.zipfian. The new distributions nchypergeom_fisher and nchypergeom_wallenius ...
Read more >
7.2 - Probability Mass Functions | STAT 414
Example 7-6. Determine the constant so that the following p.m.f. of the random variable is a valid probability mass function: f ( y...
Read more >
SciPy 1.0.0 Release Notes
The methods cdf and logcdf were added to scipy.stats.multivariate_normal , providing the cumulative distribution function of the multivariate normal ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found