Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CoxPH throws 'delta contains nan value(s). Convergence halted'

See original GitHub issue

There was a closed issue for this at but it was closed based on the direction to ensure no columns had constant values. None of my columns have constant values (my data is attached compressed, github wouldn’t let me attach as a 2,000 row csv).

I’m able to fit an AalenAdditiveFilter to this data using the code below:

    model = AalenAdditiveFitter(), duration_col='duration', event_col='event_observed')        

but I get “ValueError: delta contains nan value(s). Convergence halted.” when fitting to a CoxPH model with similar code for CoxPH:

    model = CoxPHFitter(), duration_col='duration', event_col='event_observed')        

Thanks for this library!


Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

CamDavidsonPiloncommented, Nov 14, 2017

Here was my work (this is using the latest lifelines, 0.12, released recently):

from lifelines import CoxPHFitter
cp = CoxPHFitter()
data = pd.read_csv("lifelines_df_colnames_masked.csv", sep="|")
data = data.dropna(), 'duration', 'event_observed')

I got a lifelines warning about two columns, and my fitting failed with a NaN calc. However the warning tells us what the problem is:

lifelines/utils/ RuntimeWarning: Column(s) ['dummy1', 'dummy7'] have very low variance. This may harm convergence. Try dropping this redundant column before fitting if convergence fails.
  warnings.warn(warning_text, RuntimeWarning)

So I dropped those:

data = data.drop(['dummy1', 'dummy7'], axis=1), 'duration', 'event_observed')

Again, the fitting failed with a singular-matrix error. This implies some linear dependance.

data.corr()  # noticed num7 and num6 had perfect correlation with num5
data = data.drop(['num7', 'num6'], axis=1), 'duration', 'event_observed')
n=10186, number of events=3772

          coef  exp(coef)  se(coef)        z      p  lower 0.95  upper 0.95
num1    0.0592     1.0610    0.0213   2.7753 0.0055      0.0174      0.1010   **
num2    0.1001     1.1053    0.0207   4.8350 0.0000      0.0595      0.1407  ***
num3    0.0348     1.0354    0.0311   1.1163 0.2643     -0.0263      0.0958
dummy2 -1.0526     0.3490    1.0025  -1.0500 0.2937     -3.0179      0.9127
dummy3 -1.0857     0.3377    1.0025  -1.0830 0.2788     -3.0509      0.8796
dummy4 -1.0163     0.3619    1.0016  -1.0146 0.3103     -2.9799      0.9473
dummy5 -1.1313     0.3226    1.0420  -1.0857 0.2776     -3.1739      0.9114
dummy6 -1.0859     0.3376    1.0304  -1.0539 0.2919     -3.1058      0.9340
num4   -0.0052     0.9949    0.0004 -11.5358 0.0000     -0.0060     -0.0043  ***
num5   -0.3604     0.6974    0.0364  -9.9046 0.0000     -0.4317     -0.2890  ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Concordance = 0.580

MaxPowerWasTakencommented, Nov 15, 2017

Thanks Cam,

Yeah the 100% correlation between some columns (‘num7’, ‘num6’) is caused by my doing development on a very small amount of data. In my last comment, I had dropped the same columns you do above, but got stuck at

RuntimeWarning: invalid value encountered in sqrt
se = np.sqrt(inv(-self.hessian).diagonal()) / self._norm_std

After upgrading to lifelines 0.12 (had latest conda-forge version, 0.11.2) and dropping the low-variance variables you dropped (‘dummy1’, ‘dummy7’), I now can reproduce your results. Those “low-variance” dummies were dummies that were almost always 0 in this split of the data, but not across all of my data.

With cross-validation I don’t always know in advance which dummy vars might be ‘low variance’ for a given data-split. I use a cross-validated pipeline and wanted to use a thinly-wrapped lifelines model as my estimator at the end of the pipeline. But I can’t dynamically drop whichever dummy-columns are low-variance at the end of my pipeline because the final-step (the model) needs to see the same columns in fit and predict. I’ll give this some more thought and maybe post a SO question if I can’t figure it out. Thanks again for the help and the library.

Feel free to close this issue

Read more comments on GitHub >

github_iconTop Results From Across the Web

Convergence halted due to matrix inversion problems - Stack ...
This is the error message I get ConvergenceError: Convergence halted due to matrix inversion problems. Suspicion is high collinearity. Please ...
Read more >
Convergence Error with Lifelines CoxPHFitter when using ...
I want to evaluate my Cox model using cross validation for which lifelines package does not support. So I must use the sklearn...
Read more >
lifelines Documentation - Read the Docs
Traditionally, survival analysis was developed to measure lifespans of individuals. An actuary or health professional.
Read more >
How to use the lifelines.utils.ConvergenceError function ... - Snyk
LinAlgError): raise ConvergenceError( """Convergence halted due to matrix ... T, E, weights) raise ConvergenceError( """delta contains nan value(s).
Read more >
A coxph model that has a numeric failure may have undefined predicted values, in which case the concordance will be NULL.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Post

No results found

github_iconTop Related Hashnode Post

No results found