question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using weighted data in proportional_hazard_test() for CoxPH

See original GitHub issue

I’ve been comparing CoxPH results for R’s Survival and Lifelines, and I’ve noticed huge differences for the output of the test for proportionality when I use weights instead of repeated rows. The hazard ratio estimate and CI’s are very close, but the proportionality chisq is very different.

I’ve attached a csv (txt because Github) with sample data.

The R (Survival) code is:

> hr_data <- lapply(hr, as.numeric) #only need this when using weights
> res.cox <- coxph(Surv(daysToEvent, hasOutcome) ~ cohort, data = hr_data, weights = count, robust = FALSE)
> x <- cox.zph(res.cox, transform = "km")
> summary(res.cox)
> print(x)

The Lifelines code is:

cph = CoxPHFitter()

cph.fit(
        df=hr_df, # hr data as a dataframe
        duration_col="daysToEvent",
        event_col="hasOutcome",
        weights_col="count",
        show_progress=False,
        robust=False,
    )

hazard_ratio = cph.hazard_ratios_.at["cohort"]
 lower_ci = math.exp(cph.confidence_intervals_.values[0][0])
 upper_ci = math.exp(cph.confidence_intervals_.values[0][1])
proportional_test = proportional_hazard_test(cph, hr_df, time_transform="km")

For the attached data, using weights, I get from Lifelines:

"hazardRatio": 0.5070670424582165,
        "lower_ci": 0.2374970008403611,
        "upper_ci": 1.0826115051454892
"chisq": 6.030892236208823,
        "dof": 1.0,
        "p": 0.014057625681369732

from R:

       exp(coef) exp(-coef) lower .95 upper .95
cohort    0.5071      1.972    0.2375     1.083

       chisq df       p
cohort  30.3  1 3.7e-08

Whereas using a row per entry and no weights, I get Lifelines:

"hazardRatio": 0.48501838000070696,
        "lower_ci": 0.22532011968933832,
        "upper_ci": 1.0440382743576246
"chisq": 29.78133316357846,
        "dof": 1.0,
        "p": 4.836262532790787e-08

R:

       exp(coef) exp(-coef) lower .95 upper .95
cohort     0.485      2.062    0.2253     1.044

              chisq df       p
cohort  31.7  1 1.8e-08

So the hazard ratio values and errors are in good agreement, but the chi-square for proportionality is way off when using weights in Lifelines (6 vs 30). thanks. hr.txt

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
aonguscommented, Jul 30, 2020

Hi @CamDavidsonPilon , thanks for figuring this out. I can see how these numbers will be different from different regressors/implementations.

I guess tho from my perspective the more immediate issue was that using weighted vs unweighted data produced totally different results.

1reaction
CamDavidsonPiloncommented, Jun 17, 2020

ack sorry, it’s a high priority but am stuck on it. I haven’t made much progress, unfortunately.

Read more comments on GitHub >

github_iconTop Results From Across the Web

coxphw: Weighted Estimation in Cox Regression
Cox Analysis of Survival Data with Non-Proportional Hazard Functions. J R ... regression (coxphw with template = "PH" or coxph) or weighted.
Read more >
Weighted Cox Regression Using the R Package coxphw
Cox's regression model for the analysis of survival data relies on the proportional hazards assumption. However, this assumption is often violated in practice ......
Read more >
Does the weight option in coxph function fit the weighted cox ...
No, using the weights gives you a weighted estimator rather than a weighted model. The model is still λ(t,z)=λ0(t)ezβ.
Read more >
Testing the proportional hazard assumptions - lifelines
Testing the proportional hazard assumptions¶. This Jupyter notebook is a small tutorial on how to test and fix proportional hazard problems.
Read more >
The Stratified Cox Proportional Hazards Regression Model
Once we stratify the data, we fit the Cox proportional hazards model ... The data set we'll use to illustrate the procedure of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found