Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

fitting discrete distributions

See original GitHub issue

Actually we can use scipy.stats.rv_continuous.fit method to extract the parameters for a theoretical continuous distribution from empirical data, however, it is not implemented for discrete distributions e.g. negative binomial and Poisson… may it be implemented in a near future?

Issue Analytics

State:
Created 3 years ago
Reactions:3
Comments:28 (20 by maintainers)

Top GitHub Comments

6reactions

rlucas7commented, May 6, 2020

discrete distributions taken from here: https://github.com/scipy/scipy/blob/master/scipy/stats/_distr_params.py#L115

dist name	mle	mme
bernoulli	sample mean	.
betabinom		mme
binom		n + p only p only n*
boltzmann	see note	.
dlaplace	mle^2	.
geom	mle	.
hypergeom	.	.
logser		.
nbinom	mle^1	.
planck		.
poisson	mle
randint	mle for upper bound	.
skellam	mle^3	.
zipf		.
yulesimon	mle	only if >1

note there are several proposals for the ‘only n’ case. ^1 note the UMVUE dominates the MLE, and MLE is biased, and is not the MME ^2 MLE \hat{p}= \frac{T}{1+\sqrt{1+T}} where T=\sum_{i=1}^n | x_i| eqn (32) in linked paper and immediately afterwards they state that MLE = MME and they also give a distributional result. ^3 the MLE for Skellam is a special case of the root finding problem considered in section 3.2 of that paper, corresponds to COM-Poisson with the \nu_i =1 for i =1,2

Boltzmann note: Boltzmann is a truncated geometric distribution. An MLE for the probability of a geometric exists, so incorporation of a truncation point should be straightfoward. Estimation of truncation point is not known (to me) but also straightforward to work out. Joint estimation of truncation point and probability would need to be worked out. Unclear to me at this time if there are technical difficulties in this joint case.

Note: the remaining distributions: logseries, hypergeometric, planck, zipf, may exist but I was unable to find references.

Existence of estimators in discrete case is better than I expected but note that if a fit method is added this could preclude inclusion of some distributions. For example, there is an open PR with a poisson-binomial distribution, AFAIK there is no known MLE for poisson-binomial.

After doing the research I’ve gone from a -0.0 on this to a +0.3 on adding the fit. I’d recommend to open a discussion thread on the topic on the scipy-dev list. You’ll reach more people there than here and get better opinions than mine. 😃

4reactions

mdhabercommented, Dec 11, 2020

In the meantime:

import numpy as np
from scipy.optimize import brute, differential_evolution
from scipy.stats import binom
import warnings

def func(free_params, *args):
    dist, x = args
    # negative log-likelihood function
    ll = -np.log(dist.pmf(x, *free_params)).sum()
    if np.isnan(ll):  # occurs when x is outside of support
        ll = np.inf   # we don't want that
    return ll

def fit_discrete(dist, x, bounds, optimizer=brute):
    with warnings.catch_warnings(): 
        warnings.simplefilter("ignore")  
        return optimizer(func, bounds, args=(dist, x))

n, p = 5, 0.4
x = binom.rvs(n, p, size=10000)

bounds = [(0, 100), (0, 1)]
u2, s2 = fit_discrete(binom, x, bounds)
res = fit_discrete(binom, x, bounds, optimizer=differential_evolution)
print(u2, s2)
print(res.x)

5.000011549883961 0.3978198921753501
[5.01158844 0.39695459]

I am curious how well something so simple works, so if you try it out, let me know how it goes.

Top Results From Across the Web

Goodness-of-Fit Tests for Discrete Distributions

Discrete distributions have a finite number of values. Learn how to perform goodness-of-fit tests to see if your data fit various discrete distributions....

How to fit a discrete distribution to count data? - Cross Validated

Methods of fitting discrete distributions ; 1) Maximum Likelihood ; 2) Method of moments ; 3) Minimum chi-square.

Fitting a distribution for a discrete variable | Vose Software

This section discusses techniques for fitting a distribution to observations for a discrete variable. Before proceeding we suggest that you review the ...

FITTING DISCRETE DISTRIBUTIONS ON THE FIRST TWO ...

Introduction. A common way to t a continuous distribution on the mean,. EX, and the coe cient of variation, cX, of a given...

Fitting Discrete Distributions to Data With SciPy (Python)

Hi everyone! This video is about how to use the Python SciPy library to fit a probably distribution to data, using the Poisson...