BUG: invgauss.cdf should return the correct value when `mu` is very small.
See original GitHub issueCurrently invgauss.cdf
returns NaN when mu is too small. This is due to exp(1 /mu)
blowing up when mu is small (the docs say that this happens for values smaller than 0.0028).
https://github.com/scipy/scipy/blob/fc77ea19923c39618547b4033a9185dd8a3afcc1/scipy/stats/_continuous_distns.py#L3504-L3509
In the expression evaluatating the CDF, the term _norm_cdf(-fac*(x+mu)/mu)
is zero when mu is very small, so that the CDF evaluates to 1 due to the term _norm_cdf(fac*(x-mu)/mu)
being approximately 1 for very small mu
.
I believe that returning nan
instead of 1 is not the best approach to handle the overflow. The overflow of exp
does not practically affect the final value of the cdf is cases when mu
is very small because _norm_cdf(-fac*(x+mu)/mu)
is zero anyway.
Reproducing code example:
In [1]: import numpy as np
In [2]: from scipy.stats import invgauss
In [3]: rng = np.random.RandomState(1)
In [4]: mu = rng.uniform(0., 0.01, size=5)
In [5]: mu
Out[5]:
array([4.17022005e-03, 7.20324493e-03, 1.14374817e-06, 3.02332573e-03,
1.46755891e-03])
In [6]: invgauss.cdf(0.4, mu=mu)
/home/scipy/scipy/stats/_continuous_distns.py:3508: RuntimeWarning: overflow encountered in exp
C1 += np.exp(1.0/mu) * _norm_cdf(-fac*(x+mu)/mu) * np.exp(1.0/mu)
/home/scipy/scipy/stats/_continuous_distns.py:3508: RuntimeWarning: invalid value encountered in multiply
C1 += np.exp(1.0/mu) * _norm_cdf(-fac*(x+mu)/mu) * np.exp(1.0/mu)
Out[6]: array([ 1., 1., nan, 1., 1.])
I played around with handling the overflow by setting the value of exp(1/mu)
to the largest double and then the remainder of the function evaluates to the “correct” value, which is 1.
In [4]: mu = np.random.uniform(0., 0.01, size=5)
In [5]: invgauss.cdf(0.4, mu=mu)
Out[5]: array([1., 1., 1., 1., 1.])
In [6]: mu
Out[6]: array([0.0006815 , 0.00685858, 0.00949644, 0.00324687, 0.00621239])
In [7]: rng = np.random.RandomState(1)
In [8]: mu = rng.uniform(0., 0.01, size=5)
In [9]: mu
Out[9]:
array([4.17022005e-03, 7.20324493e-03, 1.14374817e-06, 3.02332573e-03,
1.46755891e-03])
In [10]: invgauss.cdf(0.4, mu=mu)
Out[10]: array([1., 1., 1., 1., 1.])
Here is the code snippet: https://github.com/scipy/scipy/compare/master...zoj613:invgauss
I was wondering what does everyone think regarding changing the current behavior of the function.
1.7.0.dev0+fc77ea1 1.20.1 sys.version_info(major=3, minor=8, micro=6, releaselevel='final', serial=0)
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:6 (6 by maintainers)
I guess setting it to 1 independent of the x value will not be correct. If x is very small, smaller than mu, then the cdf will be zero.
stats.invgauss.cdf(1e-6, mu=1e-2) 0.0
I didn’t check the details, cdf might converge to a jump, and pdf converges to a spike just above zero.
Can we use logs to avoid the overflow? Update: Yes. See suggestion in gh-13616.