Note about appropriate and inappropriate uses of cross_val_predict
See original GitHub issueUsers are tempted to run metric(y_true, cross_val_predict(...))
. This is not equivalent to cross_val_score(...)
which takes an average over cross-validation folds; nor is it a standard or appropriate measure of generalisation error.
I consider valid uses of cross_val_predict
to be:
- visualisation of training set predictions
- model blending: you have an ensemble in which a “downstream” estimator learns from the predictions of an “upstream” estimator, so you should train the downstream not on the true labels, nor on the upstream estimator’s predictions on its training data as both are too biased.
cross_val_predict
is not biased in the same way.
Do you agree with this summary, @GaelVaroquaux?
Issue Analytics
- State:
- Created 6 years ago
- Comments:13 (11 by maintainers)
Top Results From Across the Web
Why should we use cross_val_predict instead of just normally ...
Yes, we don't use cross_val_predict for that. You use cross validation to compare performance of different estimators and select the best ...
Read more >3.1. Cross-validation: evaluating estimator performance
Note on inappropriate usage of cross_val_predict ... Thus, cross_val_predict is not an appropriate measure of generalization error.
Read more >Calculate evaluation metrics using cross_val_predict sklearn
Warning Note on inappropriate usage of cross_val_predict ... Thus, cross_val_predict is not an appropriate measure of generalisation error.
Read more >Linear Regression >> Cross-Validation, Readings ...
Warning Note on inappropriate usage of cross_val_predict: ... It is a good alternative to k-fold cross-validation.
Read more >Cross-validation: evaluating estimator performance -- sklearn
Note on inappropriate usage of cross_val_predict ... Thus, cross_val_predict is not an appropriate measure of generalization error.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
On the contrary, that distribution is a good indication that either your cv strategy is inappropriate or that you do not have enough data to get a good estimate of generalisation error or that your generalisation error is truly highly variable.
I agree that it doesn’t make sense for all cases, but why would it not make sense for something like accuracy?
If your CV method has imbalanced folds, for instance:
Acc: 0.67 on 20 test samples Acc: 0.7 on 20 test samples Acc: 0.8 on 10 test samples
simply averaging them would give the third fold disproportionate importance in the final accuracy value.
Wouldn’t running
accuracy_score(y_true, cross_val_predict(...))
give a more accurate value in this case?