*SearchCV should warn if calculating train score is unduly expensive
See original GitHub issueI suspect we made a mistake in letting return_train_score
default to True (see #9619 for example) in GridSearchCV
, as it can sometimes be expensive to score over a training set.
I think it would be useful to users if we issue a warning (in _fit_and_score? at the end of fitting?) if train scoring is greater than ?10% of fit and test score time and is more than a few seconds for the entire grid search… or something.
It’s hard to come up with a precise heuristic, and it’s hard to know when to issue it.
The other option is to make return_train_score=True
stop being the default…
Issue Analytics
- State:
- Created 6 years ago
- Comments:11 (11 by maintainers)
Top Results From Across the Web
sklearn.model_selection.RandomizedSearchCV
However computing the scores on the training set can be computationally expensive and is not strictly required to select the parameters that yield...
Read more >High train score, very low test score | Data Science ... - Kaggle
I am getting a 100% fit on my training set which is not unusual, however when I do cross-validation and test/train splitting I...
Read more >Should I use GridSearchCV on all of my data? Or just the ...
I think it's important to step back and consider the purpose of breaking your data into a training and test set in the...
Read more >Model evaluation — Applied Machine Learning in Python
The step of retraining a model using both the training and validation set is optional, and if model training is very expensive, or...
Read more >Stop using Grid Search Cross-Validation for Hyperparameter ...
To train robust machine learning models, one must select the best ... Grid Search CV execution time and Test AUC-ROC score for various ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
No, we can’t print before scoring the training data. I’m not really expecting users to stop the process if it’s taking a long time, but to understand that it is lengthened by the option, and to change the option in the future. so even warning after all fits are done may suffice.
On 1 Sep 2017 10:18 pm, “Kumar Ashutosh” notifications@github.com wrote:
@amueller We need to find print a warning before scoring training data, right? And 5s after starting an estimator, right? I am a bit confused. And I agree with the idea of changing the default to “warn”.