question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: invgauss.cdf should return the correct value when `mu` is very small.

See original GitHub issue

Currently invgauss.cdf returns NaN when mu is too small. This is due to exp(1 /mu) blowing up when mu is small (the docs say that this happens for values smaller than 0.0028). https://github.com/scipy/scipy/blob/fc77ea19923c39618547b4033a9185dd8a3afcc1/scipy/stats/_continuous_distns.py#L3504-L3509

In the expression evaluatating the CDF, the term _norm_cdf(-fac*(x+mu)/mu) is zero when mu is very small, so that the CDF evaluates to 1 due to the term _norm_cdf(fac*(x-mu)/mu) being approximately 1 for very small mu.

I believe that returning nan instead of 1 is not the best approach to handle the overflow. The overflow of exp does not practically affect the final value of the cdf is cases when mu is very small because _norm_cdf(-fac*(x+mu)/mu) is zero anyway.

Reproducing code example:

In [1]: import numpy as np

In [2]: from scipy.stats import invgauss

In [3]: rng = np.random.RandomState(1)

In [4]: mu = rng.uniform(0., 0.01, size=5)

In [5]: mu
Out[5]: 
array([4.17022005e-03, 7.20324493e-03, 1.14374817e-06, 3.02332573e-03,
       1.46755891e-03])

In [6]: invgauss.cdf(0.4, mu=mu)
/home/scipy/scipy/stats/_continuous_distns.py:3508: RuntimeWarning: overflow encountered in exp
  C1 += np.exp(1.0/mu) * _norm_cdf(-fac*(x+mu)/mu) * np.exp(1.0/mu)
/home/scipy/scipy/stats/_continuous_distns.py:3508: RuntimeWarning: invalid value encountered in multiply
  C1 += np.exp(1.0/mu) * _norm_cdf(-fac*(x+mu)/mu) * np.exp(1.0/mu)
Out[6]: array([ 1.,  1., nan,  1.,  1.])

I played around with handling the overflow by setting the value of exp(1/mu) to the largest double and then the remainder of the function evaluates to the “correct” value, which is 1.

In [4]: mu = np.random.uniform(0., 0.01, size=5)

In [5]: invgauss.cdf(0.4, mu=mu)
Out[5]: array([1., 1., 1., 1., 1.])

In [6]: mu
Out[6]: array([0.0006815 , 0.00685858, 0.00949644, 0.00324687, 0.00621239])

In [7]: rng = np.random.RandomState(1)

In [8]: mu = rng.uniform(0., 0.01, size=5)

In [9]: mu
Out[9]: 
array([4.17022005e-03, 7.20324493e-03, 1.14374817e-06, 3.02332573e-03,
       1.46755891e-03])

In [10]: invgauss.cdf(0.4, mu=mu)
Out[10]: array([1., 1., 1., 1., 1.])

Here is the code snippet: https://github.com/scipy/scipy/compare/master...zoj613:invgauss

I was wondering what does everyone think regarding changing the current behavior of the function.

1.7.0.dev0+fc77ea1 1.20.1 sys.version_info(major=3, minor=8, micro=6, releaselevel='final', serial=0)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
josef-pktcommented, Feb 25, 2021

I guess setting it to 1 independent of the x value will not be correct. If x is very small, smaller than mu, then the cdf will be zero.

stats.invgauss.cdf(1e-6, mu=1e-2) 0.0

I didn’t check the details, cdf might converge to a jump, and pdf converges to a spike just above zero.

0reactions
mdhabercommented, Mar 1, 2021

Can we use logs to avoid the overflow? Update: Yes. See suggestion in gh-13616.

Read more comments on GitHub >

github_iconTop Results From Across the Web

scipy.stats.invgauss — SciPy v0.14.0 Reference Guide
When mu is too small, evaluating the cumulative density function will be inaccurate due to cdf(mu -> 0) = inf * 0. NaNs...
Read more >
scipy - How to calculate the inverse of the normal cumulative ...
So to compute the inverse of the CDF of the standard normal distribution, you could use that function directly: In [43]: from scipy.special ......
Read more >
Frequently Asked Questions
If you really need to use that distribution, you have two options: either to remove negative values (not recommended) or to shift the...
Read more >
Fit probability distribution object to data - MATLAB fitdist
pd = fitdist( x , distname , Name,Value ) creates the probability distribution object with additional options specified by one or more name-value...
Read more >
SciPy: doc/release/1.7.0-notes.rst - Fossies
Now, for small samples without ties, the p-values returned are exact by default. ... #13614: BUG: invgauss.cdf should return the correct value when...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found