Add `return_predictions` option to the model_selection.cross_validate() API
See original GitHub issueThe cross_validate()
, cross_val_predict()
, and cross_val_score()
methods from the model_selection
module provide highly compact and convenient API for most daily works. Yet often times, one would like to get 1) multiple scores for both train and test, and 2) predictions for all samples using cross validation, which is an aggregation of test predictions from all folds.
Current API doesn’t seem to allow both goals to be achieved in a single cv run. Is it possible to add something like a return_predictions
option to the model_selection.cross_validate() API, as in here, to simply allow access to the results (which I believe have already been computed inside the function)?
Thank you!
Issue Analytics
- State:
- Created 5 years ago
- Reactions:5
- Comments:23 (22 by maintainers)
Top Results From Across the Web
3.1. Cross-validation: evaluating estimator performance
Here is a flowchart of typical cross validation workflow in model training. The best parameters can be determined by grid search techniques.
Read more >3.1. Cross-validation: evaluating estimator performance
The simplest way to use cross-validation is to call the cross_val_score helper function on the estimator and the dataset. >>> from sklearn. model_selection...
Read more >How to use a cross-validated model for prediction?
With the help of CV, you can assess hyperparameters and compare different models to each other. It's just an alternative to a train/test...
Read more >Cross Validation and Grid Search for Model Selection in Python
One such factor is the performance on cross validation set and another other factor is the choice of parameters for an algorithm.
Read more >Nested Cross-Validation for Machine Learning with Python
The k-fold cross-validation procedure is used to estimate the performance of machine learning models when making predictions on data not ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I don’t really like returning the indices and models. You’re still requiring the user to reimplement quite a bit, why wouldn’t they just completely reimplement it as @mostafahadian did?
I would favor adding a
return_predictions
(and possiblypredict_method
) to cross-validate, which would provide per-fold predictions. I think the user can be trusted to hstack them.[after reading the above more carefully: as usual I agree with @jnothman]
I think wanting to get predictions as well as scores is a common use-case, albeit open to misuse. To some extent I’d rather have
return_predictions
which would return test set predictions as well as indices (under a separate key) rather thanreturn_includes
. One reason is that there’s no harm including CV indices if returning predictions, and it might help guide the users towards reasonable usage patterns. One reason is that some things should be returned without being asked for, and multi-metric scoring naturally expands the set of returned keys. One reason is that it’s better forreturn_train_score
to be available consistent with*SearchCV
.