question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

predict_median is sometimes a poor default for k_fold_cross_validation

See original GitHub issue

Currently when one calls k_fold_cross_validation on a Cox model, the concordance is calculated using the predict_median. For problems where the majority of cases survive the entire study duration, this can be problematic, as many cases end up having median survival time of inf. (For a flavor of how severe this is, it knocked a full 0.10 off the concordance index for a problem I’m working on, which really threw me for a loop until I did my own split-sample testing.)

I guess it’s hard to find a default here that works well, but using predict_median for a Cox model leads to some pretty strange behavior. Here are a couple of solutions I can think of (I’d be happy to implement one but want to run it by others first):

  • Default to predict_partial_hazard (or rather, its negative) for the Cox model and predict_median otherwise
  • Allow each model to implement a predict_rank function and use that by default
  • Switch from predict_median to something less likely to generate infs (perhaps predict_expectation? not sure how well this would work for non-Cox models)
  • Have k_fold_cross_validation issue a warning if it seems like there are too many ties in the output of predict_median

Which one seems best?

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
emindenizcommented, Dec 11, 2017

Ok, makes perfect sense. It is just a way to order subjects so that we can check the concordance between ordering by the model and ordering by the data.

0reactions
CamDavidsonPiloncommented, Dec 11, 2017

Could you please explain why predict_partial_hazard function gives predicted survival times?

it doesn’t, it simply gives a scalar that (because of how it’s used in the cox model) can be seen as an ordinal value (higher scalar == faster to die).

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Fix k-Fold Cross-Validation for Imbalanced Classification
The two most common approaches used for model evaluation are the train/test split and the k-fold cross-validation procedure.
Read more >
Why is 10 considered the default value for k-fold cross ...
We most often use k=10 because evidence shows it's the best value for k . Smaller values don't give as good estimates, and...
Read more >
When Cross Validation Fails - Towards Data Science
I use cross validation as my default validation scheme, but this week I encountered an issue ... The modelling was performed in a...
Read more >
Data Mining: Assignment#4 - Amazon AWS
K -fold cross validation involves randomly dividing a set of data into k amount ... default student balance income ## No :9667 No...
Read more >
Survival regression — lifelines 0.27.4 documentation
Often we have additional data aside from the duration that we want to use. ... lifelines has an implementation of k-fold cross validation...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found