predict_median is sometimes a poor default for k_fold_cross_validation
See original GitHub issueCurrently when one calls k_fold_cross_validation
on a Cox model, the concordance is calculated using the predict_median
. For problems where the majority of cases survive the entire study duration, this can be problematic, as many cases end up having median survival time of inf
. (For a flavor of how severe this is, it knocked a full 0.10 off the concordance index for a problem I’m working on, which really threw me for a loop until I did my own split-sample testing.)
I guess it’s hard to find a default here that works well, but using predict_median
for a Cox model leads to some pretty strange behavior. Here are a couple of solutions I can think of (I’d be happy to implement one but want to run it by others first):
- Default to
predict_partial_hazard
(or rather, its negative) for the Cox model andpredict_median
otherwise - Allow each model to implement a
predict_rank
function and use that by default - Switch from
predict_median
to something less likely to generateinf
s (perhapspredict_expectation
? not sure how well this would work for non-Cox models) - Have
k_fold_cross_validation
issue a warning if it seems like there are too many ties in the output ofpredict_median
Which one seems best?
Issue Analytics
- State:
- Created 8 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
How to Fix k-Fold Cross-Validation for Imbalanced Classification
The two most common approaches used for model evaluation are the train/test split and the k-fold cross-validation procedure.
Read more >Why is 10 considered the default value for k-fold cross ...
We most often use k=10 because evidence shows it's the best value for k . Smaller values don't give as good estimates, and...
Read more >When Cross Validation Fails - Towards Data Science
I use cross validation as my default validation scheme, but this week I encountered an issue ... The modelling was performed in a...
Read more >Data Mining: Assignment#4 - Amazon AWS
K -fold cross validation involves randomly dividing a set of data into k amount ... default student balance income ## No :9667 No...
Read more >Survival regression — lifelines 0.27.4 documentation
Often we have additional data aside from the duration that we want to use. ... lifelines has an implementation of k-fold cross validation...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Ok, makes perfect sense. It is just a way to order subjects so that we can check the concordance between ordering by the model and ordering by the data.
it doesn’t, it simply gives a scalar that (because of how it’s used in the cox model) can be seen as an ordinal value (higher scalar == faster to die).