Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add `return_predictions` option to the model_selection.cross_validate() API

See original GitHub issue

The cross_validate(), cross_val_predict(), and cross_val_score() methods from the model_selection module provide highly compact and convenient API for most daily works. Yet often times, one would like to get 1) multiple scores for both train and test, and 2) predictions for all samples using cross validation, which is an aggregation of test predictions from all folds.

Current API doesn’t seem to allow both goals to be achieved in a single cv run. Is it possible to add something like a return_predictions option to the model_selection.cross_validate() API, as in here, to simply allow access to the results (which I believe have already been computed inside the function)?

Thank you!

Issue Analytics

State:
Created 5 years ago
Reactions:5
Comments:23 (22 by maintainers)

Top GitHub Comments

1reaction

amuellercommented, Aug 6, 2019

I don’t really like returning the indices and models. You’re still requiring the user to reimplement quite a bit, why wouldn’t they just completely reimplement it as @mostafahadian did?

I would favor adding a return_predictions (and possibly predict_method) to cross-validate, which would provide per-fold predictions. I think the user can be trusted to hstack them.

[after reading the above more carefully: as usual I agree with @jnothman]

1reaction

jnothmancommented, May 22, 2019

I think wanting to get predictions as well as scores is a common use-case, albeit open to misuse. To some extent I’d rather have return_predictions which would return test set predictions as well as indices (under a separate key) rather than return_includes. One reason is that there’s no harm including CV indices if returning predictions, and it might help guide the users towards reasonable usage patterns. One reason is that some things should be returned without being asked for, and multi-metric scoring naturally expands the set of returned keys. One reason is that it’s better for return_train_score to be available consistent with *SearchCV.

Top Results From Across the Web

3.1. Cross-validation: evaluating estimator performance

Here is a flowchart of typical cross validation workflow in model training. The best parameters can be determined by grid search techniques.

3.1. Cross-validation: evaluating estimator performance

The simplest way to use cross-validation is to call the cross_val_score helper function on the estimator and the dataset. >>> from sklearn. model_selection...

How to use a cross-validated model for prediction?

With the help of CV, you can assess hyperparameters and compare different models to each other. It's just an alternative to a train/test...

Cross Validation and Grid Search for Model Selection in Python

One such factor is the performance on cross validation set and another other factor is the choice of parameters for an algorithm.

Nested Cross-Validation for Machine Learning with Python

The k-fold cross-validation procedure is used to estimate the performance of machine learning models when making predictions on data not ...