Weibull fitter: delta nan error
See original GitHub issueHi Cam/lifelines contributors,
Thanks for this great package.
I’m trying to use the Weibull fitter, mostly successfully, but I get an error when using this data set:
from lifelines import WeibullFitter
T = [13467, 13760, 12011, 7798, 7928]
E = [0,1,0,0,0]
wf = WeibullFitter()
wf.fit(T, E)
This is the full error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-30-4b49a1327419> in <module>
3 E = [0,1,0,0,0]
4 wf = WeibullFitter()
----> 5 wf.fit(T, E)
c:\users\hcarlens\appdata\local\continuum\anaconda3\envs\lab\lib\site-packages\lifelines\fitters\weibull_fitter.py in fit(self, durations, event_observed, timeline, entry, label, alpha, ci_labels)
99
100 # estimation
--> 101 self.lambda_, self.rho_ = self._newton_rhaphson(self.durations, self.event_observed)
102 self.survival_function_ = pd.DataFrame(self.survival_function_at_times(self.timeline), columns=[self._label], index=self.timeline)
103 self.hazard_ = pd.DataFrame(self.hazard_at_times(self.timeline), columns=[self._label], index=self.timeline)
c:\users\hcarlens\appdata\local\continuum\anaconda3\envs\lab\lib\site-packages\lifelines\fitters\weibull_fitter.py in _newton_rhaphson(self, T, E, precision)
151 delta = solve(j, - step_size * g.T)
152 if np.any(np.isnan(delta)):
--> 153 raise ValueError("delta contains nan value(s). Convergence halted.")
154
155 parameters += delta
ValueError: delta contains nan value(s). Convergence halted.
I’m using Python 3.6.6, and lifelines 0.14.6. Could you help me understand what’s going wrong here? Is this a limitation in the library, or an issue with the data set? Is this something I can check for before running the fitter?
Granted, the usefulness or meaning of the ‘fit’ in this specific case is debatable, but it would still be good to know why it’s failing.
Issue Analytics
- State:
- Created 5 years ago
- Comments:9 (6 by maintainers)
Top Results From Across the Web
Why does this data throw an error in R fitdistr? - Cross Validated
The Weibull distribution has two parameters, the scale λ and shape k (I'm following Wikipedia's notation). Both parameters are positive real ...
Read more >weibulltools: Statistical Methods for Life Data Analysis
The delta method estimates the standard errors for quantities that can be written as non-linear func- tions of ML estimators. Hence, (log-) ...
Read more >WeibullAFTFitter does not ignore the index of the DataFrame ...
TypeError: NaNs were detected in the dataset. Try using pd.isnull to find the problematic values. It seems like the function is not ignoring...
Read more >More examples and recipes — lifelines 0.27.4 documentation
delta contains nan value(s). : First try adding show_progress=True in the fit function. If the values in delta grow unbounded, it's possible the...
Read more >Weibull distribution - Wikipedia
In probability theory and statistics, the Weibull distribution /ˈwaɪbʊl/ is a continuous probability distribution. It is named after Swedish mathematician ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yeah as @CamDavidsonPilon said the WeibullFitter does not actually converge with two data points (that was my mistaken claim). Essentially, there is support for multiple solutions for the scale and shape parameters when there are few data points. That is why WeibullFitter hits the max iterations.
This is a little side-tracked from your original issue @hcarlens but it sounds like you have your
inf
issue solved. In case you or others want intuition for this, it is easier to conceptualize the problem in terms of linear regression (survival distributions can be harder to conceptualize). If you have a single point, the regression model you fit to the data has an infinite number of solutions. I drew 3 of the infinite number of lines the point could have come from. In the bottom, two points narrows the number of potential solutions. However, you would only be able to solve to a linear function, not higher order terms. For higher order terms, there is still an infinite number of solutions (image an inverted quadratic function going through those points or a cubic function). (Also this ignores random error)We can apply this intuition to parametric survival analysis models. Both the Exponential and Weibull fitters should not converge to a single event, since there is support for infinite solutions. For two data points, the exponential model may converge since it only is estimating a single parameter (I wouldn’t trust the estimate though since there is little support for it). The Weibull should not converge since we are trying to estimate two parameters from two data points. All the above is a simple heuristic to help understand what the underlying mathematics are doing for the models
No problem @hcarlens . Glad to help! As for the number of observations, I am unaware of any heuristic for the number of observations for a Weibull model. As always, the more events to fit the model to the better. Realistically, Weibull models can fit to a few observations fairly well since only two parameters are estimated. However, this depends on the variance of the data (large variation may not allow good model fit to only a few events).
One good practice is to check your Weibull model by first estimating the survival distribution non-parametrically via Kaplan Meier (
KaplanMeierFitter
in lifelines). First plot the survival curve to make sure it has some general shape that makes sense (no large drop offs or other oddities). Afterwards you can use theplot_loglogs()
function withinKaplanMeierFitter
to generate a log-log plot. In this plot, you are looking for a straight line relationship between the X-Y axes. If there is, that suggests the Weibull model is a good parametric model choice. If it isn’t you would want to use another parametric model or use non-parametric methods like Kaplan Meier.Below is some sample code you can adapt to your purpose