Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

WIP: Addition of MLE for stats.invgauss/wald

See original GitHub issue

Part of the #11782 issue.

@mdhaber,

I have been investigating the maximum likelihood estimators given in Chapter 25, Inverse Gaussian (Wald) Distribution, but the results from the MLEs don’t seem to come close to the results of either the wald or invgauss distributions in scipy.

Here is how the PDF is described in the textbook we have been using, on page 120: Screen Shot 2020-07-02 at 5 01 53 PM

And here are the equations for MLE. Screen Shot 2020-07-02 at 5 02 22 PM

It may be another issue with terminology, since the use of mu to represent the location, as the textbook describes, is implemented as a shape parameter in stats.invgauss. However, the mean of a random generate only appears to represent the mu used in generation when loc and scale are the standard loc=0, scale=1.

For example,

from scipy.stats import invgauss
data = invgauss.rvs(mu=3.25, size=10000)
print(data.mean())

=> 3.223328.... Just one example, but it appears that in general it matches.

With other location and scale values it doesn’t match.

from scipy.stats import invgauss
data = invgauss.rvs(mu=3.25, scale=2, size=10000)
print(data.mean())

=> 6.62447.... Just one example, but in general they don’t match.

The code for the scale is

scale = len(data) / (np.sum(data**(-1) - mu**(-1)))

and I played around with both of these equations quite a bit to try to get them to match up with / or be better than the default fit method, but other than the relation I described above, I’m not sure that these will work for these distributions. It may be possible to use the mean to determine the shape parameter in invgauss if the person has already fixed floc=0, scale=1, but that seems like a very niche usage.

What are your thoughts?

Issue Analytics

State:
Created 3 years ago
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

mdhabercommented, Jul 7, 2020

I think if you go a bit further, you’ll find with loc=0 that they are equivalent with

So if the user passes in floc=0, you could use the equations from the book to get the book’s version of the parameters, then use the relationships above to get SciPy’s parameters.

Update: Yes:

import numpy as np
from scipy.stats import invgauss
data = invgauss.rvs(mu=3.25, scale=2, size=1000000)

mu = np.mean(data)
s = len(data) / (np.sum(data**(-1) - mu**(-1)))
mu_s = mu/s
print(mu_s, s)

gives

3.2698217368812785 1.9933346841496646

If the user doesn’t pass in floc=0, then you could at least use the analytical solution above (assuming floc=0) as a guess to the super fit method. You might try following @WarrenWeckesser’s argument here about weibull_min to see if it applies to this distribution.

0reactions

swallancommented, Jul 7, 2020

@mdhaber @WarrenWeckesser

Thanks for the pointers! I really appreciate it. I’ll create a PR for this soon.

Top Results From Across the Web

1.2 - Maximum Likelihood Estimation | STAT 415

Suppose we have a random sample X 1 , X 2 , ⋯ , X n whose assumed probability distribution depends on some...

Maximum Likelihood Estimation (MLE) in layman terms

MLE will pick the Gaussian (i.e., the mean and variance) that is "most ... One task in statistics is to fit a distribution...

Topic 15: Maximum Likelihood Estimation - Arizona Math

The right column is based on 40 trials having 16 and 22 successes. Notice that the maximum likelihood is approximately. 10−6 for 20...

Maximum likelihood estimation - Wikipedia

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data ...

A Gentle Introduction to Maximum Likelihood Estimation

In statistics, maximum likelihood estimation (MLE) is a method of estimating ... Thus, the probabilities that attach to the possible results must sum...