Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

predict_median is sometimes a poor default for k_fold_cross_validation

See original GitHub issue

Currently when one calls k_fold_cross_validation on a Cox model, the concordance is calculated using the predict_median. For problems where the majority of cases survive the entire study duration, this can be problematic, as many cases end up having median survival time of inf. (For a flavor of how severe this is, it knocked a full 0.10 off the concordance index for a problem I’m working on, which really threw me for a loop until I did my own split-sample testing.)

I guess it’s hard to find a default here that works well, but using predict_median for a Cox model leads to some pretty strange behavior. Here are a couple of solutions I can think of (I’d be happy to implement one but want to run it by others first):

Default to predict_partial_hazard (or rather, its negative) for the Cox model and predict_median otherwise
Allow each model to implement a predict_rank function and use that by default
Switch from predict_median to something less likely to generate infs (perhaps predict_expectation? not sure how well this would work for non-Cox models)
Have k_fold_cross_validation issue a warning if it seems like there are too many ties in the output of predict_median

Which one seems best?

Issue Analytics

State:
Created 8 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

emindenizcommented, Dec 11, 2017

Ok, makes perfect sense. It is just a way to order subjects so that we can check the concordance between ordering by the model and ordering by the data.

0reactions

CamDavidsonPiloncommented, Dec 11, 2017

Could you please explain why predict_partial_hazard function gives predicted survival times?

it doesn’t, it simply gives a scalar that (because of how it’s used in the cox model) can be seen as an ordinal value (higher scalar == faster to die).