question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Separate train and test prediction/scoring inside BaseSearchCV

See original GitHub issue

Describe the workflow you want to enable

Separate train and test scoring methods in BaseSearchCV.

Describe your proposed solution

Currently, in BaseSearchCV, train and test scoring is done inside of _fit_and_score. This does not allow separate behaviour for train and test. Instead, one could move the process predicting on the train set and test set into separate methods of BaseSearchCV, allowing easy subclassing.

Describe alternatives you’ve considered, if relevant

None

Additional context

Use case for this would be that when one wants to have different behaviour of pipelines in train vs test, they could easily subclass BaseSearchCV and override the train_score and test_score methods, first setting some relevant parameter to enable different behaviour for train or tests, then performing the original scoring method. You could then subclass any BaseSearchCV subclasses to inherit from this new class and pass on said behaviour.

In terms of changing current sklearn behaviour, there would be none, but it would save a lot of work and code for anyone who wishes to extend current BaseSearchCV subclasses.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:16 (16 by maintainers)

github_iconTop GitHub Comments

1reaction
hsorskycommented, Oct 30, 2020

OK, thanks for your thoughts. Re it being only a partial solution of a specific use case, yep makes sense and I completely understand the resistance. My only counter offer would be that I’m happy to do the work for the PR, but understand this still means you guys need to go through the review process.

As for pipelines being stage-aware, although its not on your roadmap, I think its something that should be, especially as compute increases and the use of things like stacking do with it. I’d be interested in being involved in/starting the discussion should there be one.

Thanks for your time!

0reactions
jnothmancommented, Nov 13, 2020

Yes, I think this need to use a generated dataset, effectively, at training time, is handled by the resampling functionality. Similar to using kfold y in stacking.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Understanding Train Test Split - Built In
Train test split is a model validation procedure that reveals how your model performs on new data. Here's how to apply it.
Read more >
Train-Test Split for Evaluating Machine Learning Algorithms
In this tutorial, you will discover how to evaluate machine learning models using the train-test split. After completing this tutorial, ...
Read more >
Train/Test Split and Cross Validation in Python
Here is a summary of what I did: I've loaded in the data, split it into a training and testing sets, fitted a...
Read more >
Split Your Dataset With scikit-learn's train_test_split()
Training, Validation, and Test Sets. Splitting your dataset is essential for an unbiased evaluation of prediction performance. In most cases, it's enough to ......
Read more >
How to split train test data using sklearn and python?
Projectpro this recipe, will help you to learn how to split data into training and testing in python. Click here to know more....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found