Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Separate train and test prediction/scoring inside BaseSearchCV

See original GitHub issue

Describe the workflow you want to enable

Separate train and test scoring methods in BaseSearchCV.

Describe your proposed solution

Currently, in BaseSearchCV, train and test scoring is done inside of _fit_and_score. This does not allow separate behaviour for train and test. Instead, one could move the process predicting on the train set and test set into separate methods of BaseSearchCV, allowing easy subclassing.

Describe alternatives you’ve considered, if relevant

None

Additional context

Use case for this would be that when one wants to have different behaviour of pipelines in train vs test, they could easily subclass BaseSearchCV and override the train_score and test_score methods, first setting some relevant parameter to enable different behaviour for train or tests, then performing the original scoring method. You could then subclass any BaseSearchCV subclasses to inherit from this new class and pass on said behaviour.

In terms of changing current sklearn behaviour, there would be none, but it would save a lot of work and code for anyone who wishes to extend current BaseSearchCV subclasses.

Issue Analytics

State:
Created 3 years ago
Comments:16 (16 by maintainers)

Top GitHub Comments

1reaction

hsorskycommented, Oct 30, 2020

OK, thanks for your thoughts. Re it being only a partial solution of a specific use case, yep makes sense and I completely understand the resistance. My only counter offer would be that I’m happy to do the work for the PR, but understand this still means you guys need to go through the review process.

As for pipelines being stage-aware, although its not on your roadmap, I think its something that should be, especially as compute increases and the use of things like stacking do with it. I’d be interested in being involved in/starting the discussion should there be one.

Thanks for your time!

0reactions

jnothmancommented, Nov 13, 2020

Yes, I think this need to use a generated dataset, effectively, at training time, is handled by the resampling functionality. Similar to using kfold y in stacking.

Top Results From Across the Web

Understanding Train Test Split - Built In

Train test split is a model validation procedure that reveals how your model performs on new data. Here's how to apply it.

Train-Test Split for Evaluating Machine Learning Algorithms

In this tutorial, you will discover how to evaluate machine learning models using the train-test split. After completing this tutorial, ...

Train/Test Split and Cross Validation in Python

Here is a summary of what I did: I've loaded in the data, split it into a training and testing sets, fitted a...

Split Your Dataset With scikit-learn's train_test_split()

Training, Validation, and Test Sets. Splitting your dataset is essential for an unbiased evaluation of prediction performance. In most cases, it's enough to ......

How to split train test data using sklearn and python?

Projectpro this recipe, will help you to learn how to split data into training and testing in python. Click here to know more....