Unexpectedly poor results when distribution fitting with `weibull_min` and `exponweib`
See original GitHub issueMy issue is about distribution fitting with weibull_min
and exponweib
returning clearly incorrect results for shape and scale parameters.
Full details here: https://stats.stackexchange.com/questions/458652/scipy-stats-failing-to-fit-weibull-distribution-unless-location-parameter-is-con
import numpy as np
import pandas as pd
from scipy import stats
x = [4836.6, 823.6, 3131.7, 1343.4, 709.7, 610.6,
3034.2, 1973, 7358.5, 265, 4590.5, 5440.4, 4613.7, 4763.1,
115.3, 5385.1, 6398.1, 8444.6, 2397.1, 3259.7, 307.5, 4607.4,
6523.7, 600.3, 2813.5, 6119.8, 6438.8, 2799.1, 2849.8, 5309.6,
3182.4, 705.5, 5673.3, 2939.9, 2631.8, 5002.1, 1967.3, 2810.4,
2948, 6904.8]
stats.weibull_min.fit(x)
Here are the results:
shape, loc, scale = (0.1102610560437356, 115.29999999999998, 3.428664764594809)
This is clearly a very poor fit to the data. I am aware that by constraining the loc parameter to zero, I can get better results, but why should this be necessary? Shouldn’t the unconstrained fit be more likely to overfit the data than to dramatically under-fit?
And what if I want to estimate the location parameter without constraint - why should that return such unexpected results for the shape and scale parameters?
Scipy/Numpy/Python version information:
import sys, scipy, numpy; print(scipy.__version__, numpy.__version__, sys.version_info)
1.4.1 1.18.1 sys.version_info(major=3, minor=6, micro=10, releaselevel='final', serial=0)
Issue Analytics
- State:
- Created 3 years ago
- Comments:50 (32 by maintainers)
Top Results From Across the Web
scipy.stats failing to fit Weibull distribution unless location ...
I fit a Weibull distribution in R using the {fitdistrplus} package, and get back reasonable results for shape and scale parameters.
Read more >python 3.x - Does fitting Weibull distribution to data using scipy ...
However, I noticed poor performance of scipy.stats library while doing so. So, I took a different direction and checked the fit performance by...
Read more >Fitting A Weibull Distribution Using Scipy - ADocLib
In the CZI Proposal we wrote: The continuous distributions in SciPy all have a Unexpectedly poor results when distribution fitting with weibullmin and....
Read more >scipy.stats.weibull_min — SciPy v1.9.3 Manual
It arises as the limiting distribution of the rescaled minimum of iid random variables. As an instance of the rv_continuous class, weibull_min object...
Read more >Fusion Learning: A One Shot Federated Learning - PMC - NCBI
These are sent only once, thereby requiring only one communication round. The server generates artificial data using the distribution parameters ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
A good start value is crucial for the ML method to find a good solution, especially if many parameters are to be estimated. So if weibul_min had a better _fitstart function like this:
it would solve many situations of unexpectedly poor results when fitting the weibul_min distribution so the user wouldn’t have to fiddle to finetune the fitting. Below I have compared the start values obtained from the default stats.weibull_min._fitstart method the method above. And it is clear that that the bad startvalue for the parameters are the cause of this poor fit:
You mean if providing the guess
loc=x
was a stronger hint, it would be preferable? I see. Even if you provideloc=0
, SciPy finds the crazy solution. But perhaps that is because there is not a local minimum of the objective function for it to settle into that is nearloc=0
.I’m not sure if we can bake what you’re looking for into the
fit
method. Maximum Likelihood Estimation is a specific way of fitting a distribution to data, and I think SciPy is doing a reasonable job of that here. You may want to define your own objective function that includes a mathematical description of sanity : )I’m only partially joking. It’s really not too tough to fit using
minimize
directly. Assuming you’ve already run the code above:produces
This is not much better, but the point is that now you can change the objective function or add constraints to get something closer to what you’re looking for. (It’s just not maximum likelihood estimation anymore.)