predict_median is sometimes a poor default for k_fold_cross_validationSee original GitHub issue
Currently when one calls
k_fold_cross_validation on a Cox model, the concordance is calculated using the
predict_median. For problems where the majority of cases survive the entire study duration, this can be problematic, as many cases end up having median survival time of
inf. (For a flavor of how severe this is, it knocked a full 0.10 off the concordance index for a problem I’m working on, which really threw me for a loop until I did my own split-sample testing.)
I guess it’s hard to find a default here that works well, but using
predict_median for a Cox model leads to some pretty strange behavior. Here are a couple of solutions I can think of (I’d be happy to implement one but want to run it by others first):
- Default to
predict_partial_hazard(or rather, its negative) for the Cox model and
- Allow each model to implement a
predict_rankfunction and use that by default
- Switch from
predict_medianto something less likely to generate
predict_expectation? not sure how well this would work for non-Cox models)
k_fold_cross_validationissue a warning if it seems like there are too many ties in the output of
Which one seems best?
- Created 8 years ago
- Comments:6 (4 by maintainers)
Top GitHub Comments
Ok, makes perfect sense. It is just a way to order subjects so that we can check the concordance between ordering by the model and ordering by the data.
Could you please explain why predict_partial_hazard function gives predicted survival times?
it doesn’t, it simply gives a scalar that (because of how it’s used in the cox model) can be seen as an ordinal value (higher scalar == faster to die).