question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

fitting discrete distributions

See original GitHub issue

Actually we can use scipy.stats.rv_continuous.fit method to extract the parameters for a theoretical continuous distribution from empirical data, however, it is not implemented for discrete distributions e.g. negative binomial and Poisson… may it be implemented in a near future?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:3
  • Comments:28 (20 by maintainers)

github_iconTop GitHub Comments

6reactions
rlucas7commented, May 6, 2020

discrete distributions taken from here: https://github.com/scipy/scipy/blob/master/scipy/stats/_distr_params.py#L115

dist name mle mme
bernoulli sample mean .
betabinom mme
binom n + p only p only n*
boltzmann see note .
dlaplace mle^2 .
geom mle .
hypergeom . .
logser .
nbinom mle^1 .
planck .
poisson mle
randint mle for upper bound .
skellam mle^3 .
zipf .
yulesimon mle only if >1
  • note there are several proposals for the ‘only n’ case. ^1 note the UMVUE dominates the MLE, and MLE is biased, and is not the MME ^2 MLE \hat{p}= \frac{T}{1+\sqrt{1+T}} where T=\sum_{i=1}^n | x_i| eqn (32) in linked paper and immediately afterwards they state that MLE = MME and they also give a distributional result. ^3 the MLE for Skellam is a special case of the root finding problem considered in section 3.2 of that paper, corresponds to COM-Poisson with the \nu_i =1 for i =1,2

Boltzmann note: Boltzmann is a truncated geometric distribution. An MLE for the probability of a geometric exists, so incorporation of a truncation point should be straightfoward. Estimation of truncation point is not known (to me) but also straightforward to work out. Joint estimation of truncation point and probability would need to be worked out. Unclear to me at this time if there are technical difficulties in this joint case.

Note: the remaining distributions: logseries, hypergeometric, planck, zipf, may exist but I was unable to find references.

Existence of estimators in discrete case is better than I expected but note that if a fit method is added this could preclude inclusion of some distributions. For example, there is an open PR with a poisson-binomial distribution, AFAIK there is no known MLE for poisson-binomial.

After doing the research I’ve gone from a -0.0 on this to a +0.3 on adding the fit. I’d recommend to open a discussion thread on the topic on the scipy-dev list. You’ll reach more people there than here and get better opinions than mine. 😃

4reactions
mdhabercommented, Dec 11, 2020

In the meantime:

import numpy as np
from scipy.optimize import brute, differential_evolution
from scipy.stats import binom
import warnings

def func(free_params, *args):
    dist, x = args
    # negative log-likelihood function
    ll = -np.log(dist.pmf(x, *free_params)).sum()
    if np.isnan(ll):  # occurs when x is outside of support
        ll = np.inf   # we don't want that
    return ll

def fit_discrete(dist, x, bounds, optimizer=brute):
    with warnings.catch_warnings(): 
        warnings.simplefilter("ignore")  
        return optimizer(func, bounds, args=(dist, x))

n, p = 5, 0.4
x = binom.rvs(n, p, size=10000)

bounds = [(0, 100), (0, 1)]
u2, s2 = fit_discrete(binom, x, bounds)
res = fit_discrete(binom, x, bounds, optimizer=differential_evolution)
print(u2, s2)
print(res.x)
5.000011549883961 0.3978198921753501
[5.01158844 0.39695459]

I am curious how well something so simple works, so if you try it out, let me know how it goes.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Goodness-of-Fit Tests for Discrete Distributions
Discrete distributions have a finite number of values. Learn how to perform goodness-of-fit tests to see if your data fit various discrete distributions....
Read more >
How to fit a discrete distribution to count data? - Cross Validated
Methods of fitting discrete distributions ; 1) Maximum Likelihood ; 2) Method of moments ; 3) Minimum chi-square.
Read more >
Fitting a distribution for a discrete variable | Vose Software
This section discusses techniques for fitting a distribution to observations for a discrete variable. Before proceeding we suggest that you review the ...
Read more >
FITTING DISCRETE DISTRIBUTIONS ON THE FIRST TWO ...
Introduction. A common way to t a continuous distribution on the mean,. EX, and the coe cient of variation, cX, of a given...
Read more >
Fitting Discrete Distributions to Data With SciPy (Python)
Hi everyone! This video is about how to use the Python SciPy library to fit a probably distribution to data, using the Poisson...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found