question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Weibull fitter: delta nan error

See original GitHub issue

Hi Cam/lifelines contributors,

Thanks for this great package.

I’m trying to use the Weibull fitter, mostly successfully, but I get an error when using this data set:

from lifelines import WeibullFitter
T = [13467, 13760, 12011, 7798, 7928]
E = [0,1,0,0,0]
wf = WeibullFitter()
wf.fit(T, E)

This is the full error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-4b49a1327419> in <module>
      3 E = [0,1,0,0,0]
      4 wf = WeibullFitter()
----> 5 wf.fit(T, E)

c:\users\hcarlens\appdata\local\continuum\anaconda3\envs\lab\lib\site-packages\lifelines\fitters\weibull_fitter.py in fit(self, durations, event_observed, timeline, entry, label, alpha, ci_labels)
     99 
    100         # estimation
--> 101         self.lambda_, self.rho_ = self._newton_rhaphson(self.durations, self.event_observed)
    102         self.survival_function_ = pd.DataFrame(self.survival_function_at_times(self.timeline), columns=[self._label], index=self.timeline)
    103         self.hazard_ = pd.DataFrame(self.hazard_at_times(self.timeline), columns=[self._label], index=self.timeline)

c:\users\hcarlens\appdata\local\continuum\anaconda3\envs\lab\lib\site-packages\lifelines\fitters\weibull_fitter.py in _newton_rhaphson(self, T, E, precision)
    151             delta = solve(j, - step_size * g.T)
    152             if np.any(np.isnan(delta)):
--> 153                 raise ValueError("delta contains nan value(s). Convergence halted.")
    154 
    155             parameters += delta

ValueError: delta contains nan value(s). Convergence halted.

I’m using Python 3.6.6, and lifelines 0.14.6. Could you help me understand what’s going wrong here? Is this a limitation in the library, or an issue with the data set? Is this something I can check for before running the fitter?

Granted, the usefulness or meaning of the ‘fit’ in this specific case is debatable, but it would still be good to know why it’s failing.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
pzivichcommented, Nov 13, 2018

Yeah as @CamDavidsonPilon said the WeibullFitter does not actually converge with two data points (that was my mistaken claim). Essentially, there is support for multiple solutions for the scale and shape parameters when there are few data points. That is why WeibullFitter hits the max iterations.

This is a little side-tracked from your original issue @hcarlens but it sounds like you have your inf issue solved. In case you or others want intuition for this, it is easier to conceptualize the problem in terms of linear regression (survival distributions can be harder to conceptualize). If you have a single point, the regression model you fit to the data has an infinite number of solutions. I drew 3 of the infinite number of lines the point could have come from. figure In the bottom, two points narrows the number of potential solutions. However, you would only be able to solve to a linear function, not higher order terms. For higher order terms, there is still an infinite number of solutions (image an inverted quadratic function going through those points or a cubic function). (Also this ignores random error)

We can apply this intuition to parametric survival analysis models. Both the Exponential and Weibull fitters should not converge to a single event, since there is support for infinite solutions. For two data points, the exponential model may converge since it only is estimating a single parameter (I wouldn’t trust the estimate though since there is little support for it). The Weibull should not converge since we are trying to estimate two parameters from two data points. All the above is a simple heuristic to help understand what the underlying mathematics are doing for the models

1reaction
pzivichcommented, Nov 14, 2018

No problem @hcarlens . Glad to help! As for the number of observations, I am unaware of any heuristic for the number of observations for a Weibull model. As always, the more events to fit the model to the better. Realistically, Weibull models can fit to a few observations fairly well since only two parameters are estimated. However, this depends on the variance of the data (large variation may not allow good model fit to only a few events).

One good practice is to check your Weibull model by first estimating the survival distribution non-parametrically via Kaplan Meier (KaplanMeierFitter in lifelines). First plot the survival curve to make sure it has some general shape that makes sense (no large drop offs or other oddities). Afterwards you can use the plot_loglogs() function within KaplanMeierFitter to generate a log-log plot. In this plot, you are looking for a straight line relationship between the X-Y axes. If there is, that suggests the Weibull model is a good parametric model choice. If it isn’t you would want to use another parametric model or use non-parametric methods like Kaplan Meier.

Below is some sample code you can adapt to your purpose

from lifelines import KaplanMeierFitter
import matplotlib.pyplot as plt

# Fitting Kaplan Meier
km = KaplanMeierFitter()
km.fit(durations=df['T'], event_observed=df['E'])

# Plotting survival curve
km.plot()
plt.show()

# Plot log-log transformed survival curve
km.plot_loglogs()
plt.show()
Read more comments on GitHub >

github_iconTop Results From Across the Web

Why does this data throw an error in R fitdistr? - Cross Validated
The Weibull distribution has two parameters, the scale λ and shape k (I'm following Wikipedia's notation). Both parameters are positive real ...
Read more >
weibulltools: Statistical Methods for Life Data Analysis
The delta method estimates the standard errors for quantities that can be written as non-linear func- tions of ML estimators. Hence, (log-) ...
Read more >
WeibullAFTFitter does not ignore the index of the DataFrame ...
TypeError: NaNs were detected in the dataset. Try using pd.isnull to find the problematic values. It seems like the function is not ignoring...
Read more >
More examples and recipes — lifelines 0.27.4 documentation
delta contains nan value(s). : First try adding show_progress=True in the fit function. If the values in delta grow unbounded, it's possible the...
Read more >
Weibull distribution - Wikipedia
In probability theory and statistics, the Weibull distribution /ˈwaɪbʊl/ is a continuous probability distribution. It is named after Swedish mathematician ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found