Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: invgauss.cdf should return the correct value when `mu` is very small.

See original GitHub issue

Currently invgauss.cdf returns NaN when mu is too small. This is due to exp(1 /mu) blowing up when mu is small (the docs say that this happens for values smaller than 0.0028). https://github.com/scipy/scipy/blob/fc77ea19923c39618547b4033a9185dd8a3afcc1/scipy/stats/_continuous_distns.py#L3504-L3509

In the expression evaluatating the CDF, the term _norm_cdf(-fac*(x+mu)/mu) is zero when mu is very small, so that the CDF evaluates to 1 due to the term _norm_cdf(fac*(x-mu)/mu) being approximately 1 for very small mu.

I believe that returning nan instead of 1 is not the best approach to handle the overflow. The overflow of exp does not practically affect the final value of the cdf is cases when mu is very small because _norm_cdf(-fac*(x+mu)/mu) is zero anyway.

Reproducing code example:

In [1]: import numpy as np

In [2]: from scipy.stats import invgauss

In [3]: rng = np.random.RandomState(1)

In [4]: mu = rng.uniform(0., 0.01, size=5)

In [5]: mu
Out[5]: 
array([4.17022005e-03, 7.20324493e-03, 1.14374817e-06, 3.02332573e-03,
       1.46755891e-03])

In [6]: invgauss.cdf(0.4, mu=mu)
/home/scipy/scipy/stats/_continuous_distns.py:3508: RuntimeWarning: overflow encountered in exp
  C1 += np.exp(1.0/mu) * _norm_cdf(-fac*(x+mu)/mu) * np.exp(1.0/mu)
/home/scipy/scipy/stats/_continuous_distns.py:3508: RuntimeWarning: invalid value encountered in multiply
  C1 += np.exp(1.0/mu) * _norm_cdf(-fac*(x+mu)/mu) * np.exp(1.0/mu)
Out[6]: array([ 1.,  1., nan,  1.,  1.])

I played around with handling the overflow by setting the value of exp(1/mu) to the largest double and then the remainder of the function evaluates to the “correct” value, which is 1.

In [4]: mu = np.random.uniform(0., 0.01, size=5)

In [5]: invgauss.cdf(0.4, mu=mu)
Out[5]: array([1., 1., 1., 1., 1.])

In [6]: mu
Out[6]: array([0.0006815 , 0.00685858, 0.00949644, 0.00324687, 0.00621239])

In [7]: rng = np.random.RandomState(1)

In [8]: mu = rng.uniform(0., 0.01, size=5)

In [9]: mu
Out[9]: 
array([4.17022005e-03, 7.20324493e-03, 1.14374817e-06, 3.02332573e-03,
       1.46755891e-03])

In [10]: invgauss.cdf(0.4, mu=mu)
Out[10]: array([1., 1., 1., 1., 1.])

Here is the code snippet: https://github.com/scipy/scipy/compare/master...zoj613:invgauss

I was wondering what does everyone think regarding changing the current behavior of the function.

1.7.0.dev0+fc77ea1 1.20.1 sys.version_info(major=3, minor=8, micro=6, releaselevel='final', serial=0)

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

josef-pktcommented, Feb 25, 2021

I guess setting it to 1 independent of the x value will not be correct. If x is very small, smaller than mu, then the cdf will be zero.

stats.invgauss.cdf(1e-6, mu=1e-2) 0.0

I didn’t check the details, cdf might converge to a jump, and pdf converges to a spike just above zero.

0reactions

mdhabercommented, Mar 1, 2021

Can we use logs to avoid the overflow? Update: Yes. See suggestion in gh-13616.

Top Results From Across the Web

scipy.stats.invgauss — SciPy v0.14.0 Reference Guide

When mu is too small, evaluating the cumulative density function will be inaccurate due to cdf(mu -> 0) = inf * 0. NaNs...

scipy - How to calculate the inverse of the normal cumulative ...

So to compute the inverse of the CDF of the standard normal distribution, you could use that function directly: In [43]: from scipy.special ......

Frequently Asked Questions

If you really need to use that distribution, you have two options: either to remove negative values (not recommended) or to shift the...

Fit probability distribution object to data - MATLAB fitdist

pd = fitdist( x , distname , Name,Value ) creates the probability distribution object with additional options specified by one or more name-value...

SciPy: doc/release/1.7.0-notes.rst - Fossies

Now, for small samples without ties, the p-values returned are exact by default. ... #13614: BUG: invgauss.cdf should return the correct value when...