question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CoxTimeVaryingFitter is actually faster than CoxPHFitter...

See original GitHub issue

CoxPHFitter test

    import pandas as pd
    import time

    from lifelines import CoxPHFitter
    from lifelines.datasets import load_rossi

    df = load_rossi()
    df = pd.concat([df] * 20)
    cp = CoxPHFitter()
    start_time = time.time()
    cp.fit(df, duration_col="week", event_col="arrest")
    print("--- %s seconds ---" % (time.time() - start_time))
    cp.print_summary()

takes about 2.3 seconds.

CoxTimeVaryingFitter test

    import time
    import pandas as pd
    from lifelines import CoxTimeVaryingFitter
    from lifelines.datasets import load_rossi
    from lifelines.utils import to_long_format

    df = load_rossi()
    df = pd.concat([df] * 20)
    df = df.reset_index()
    df = to_long_format(df, duration_col='week')
    ctv = CoxTimeVaryingFitter()
    start_time = time.time()
    ctv.fit(df, id_col="index", event_col="arrest", start_col="start", stop_col="stop")
    time_took = time.time() - start_time
    print("--- %s seconds ---" % time_took)
    ctv.print_summary()

takes about 1.65 seconds.

Note that the datasets between the two are identical. Even the results are identical (as expected). The internal differences are that CoxPHFitter looks at each row individually, while the CoxTimeVaryingFitter looks at all rows grouped by duration. The latter is much more efficient when there are lots of ties (i.e. when cardinality / row count is low).

This is kinda shocking to me. It means I can improve CoxPHFitter performance by like 30%.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
CamDavidsonPiloncommented, Jan 22, 2019

Down to ~0.17 with #609

0reactions
CamDavidsonPiloncommented, Jan 3, 2019

Down to 0.67 with #595

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using Statistics to Make Statistics Faster - Data Origami
The first class, CoxTimeVaryingFitter , is used for time-varying datasets. ... was actually faster than my simpler CoxPHFitter model.
Read more >
CoxPHFitter — lifelines 0.27.4 documentation - Read the Docs
This class implements fitting Cox's proportional hazard model. The baseline hazard, h0(t) can be modeled in two ways: 1.
Read more >
lifelines Documentation - Read the Docs
The dataset for regression models is different than the datasets above. ... Instead of CoxPHFitter, we must use CoxTimeVaryingFitter instead ...
Read more >
Predictions using CoxTimeVaryingFitter for survival analysis in ...
Simply put: you can't predict for epistemological reasons. Why is that? To predict, you must have time-varying X, that is, X(t).
Read more >
https://raw.githubusercontent.com/CamDavidsonPilon...
New features - `CoxPHFitter` and `CoxTimeVaryingFitter` has support for an ... but we are now about the same or slighty faster than the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found