question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Instability in the Weibull fits

See original GitHub issue

I have been testing this package to use this package as a replacement for JMP software to fit Weibull distribution on a bunch of interval censored data. The results does not seem to be stable. When I compare the results with JMP results, always the log likelihood is larger with JMP results. I have narrowed down the root cause to some extent.

There are two problems as far as I can tell:

  • When the range of fail times vary by a few orders of magnitude, there is difficulty in numerical convergence. An example of this situation is the following: [(1,10), (10, 100), (100, 1000)]
  • When there is little data available. Take the extreme case of only one censored data available [(1, 10)]. In this case, the LL function is a very flat function and it is hard to get a good convergence.

To have a better solution (this definition is very subjective here), I have figured that it is better to optimize the scale_factor in log space, meaning using log(scale_factor) as the optimization factor, and also use 1/shape_factor as the optimization factor.

@CamDavidsonPilon I was wondering if you have any suggestions or thoughts on getting a more stable result in these cases.

Thanks,

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
pistacliffchocommented, Sep 25, 2019

Just a thought: the issue may be statistical rather than numerical. That is, censored data models can suffer from the same issue as logistic regression when there is perfect separation in the data; the MLE of coefficients will be unbounded, although the optimization algorithm will converge with finite values that might not seem huge to some users, i.e., 20, since the change in log-likelihood may be numerically 0 at this point. A specific issue that a Weibull model will face is that the scale/rate parameter will go to 0/infinity at the MLE, and, like the coefficients, will never make it there.

From an algorithm’s perspective, it’s not so easy to detect this and it’s usually considered a responsibility of the user (i.e., as is the case in glm in base R) to investigate if this could be an issue. Without seeing the data, that would be my first guess about what is going on.

1reaction
amirhosseindavoodycommented, Oct 24, 2019

Okay, this is kind of over due and I have finally got time to do some more detailed analysis. Here is an example data set and the results from JMP and lifelines. example_fail_data.txt

I have used scaling factor of 5000 for the start and stop times. The data include right censored and interval censored samples. Also, there are multiple samples (categories) and the Weibull analysis is performed for each category separately. Below you can see the alpha_JMP versus alpha_lifelines and beta_JMP versus beta_lifelines. The line shows the one-to-one correlation which we want in the ideal case (matching results between JMP and lifelines).

image

image

In my analysis, the JMP result is always better meaning they result in large log likelihood for each categories. I will show those results in another post.

@CamDavidsonPilon any ideas?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Reliability Life Data Analysis (Weibull Analysis) - Statistical ...
The analyst chooses the life distribution that is most appropriate to model each particular data set based on past experience and goodness-of-fit tests....
Read more >
The Weibull Distribution - ReliaWiki
The Weibull distribution is one of the most widely used lifetime distributions in reliability engineering. It is a versatile distribution ...
Read more >
Weibull distribution - Wikipedia
In probability theory and statistics, the Weibull distribution /ˈwaɪbʊl/ is a continuous ... The Weibull fit was originally used because of a belief...
Read more >
Weibull parameter estimation and goodness-of-fit for glass ...
This study investigates 4 different methods for fitting data and estimating the parameters of the Weibull distribution namely, good linear unbiased ...
Read more >
Opti 521 Tutorial The Weibull distribution in the strength of glass
best fit. Consideration must be given to the uncertainty in the parameter estimation. If this is not done, there is a risk of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found