KFold or RepeatedKFold with Incremental estimator
See original GitHub issueHi,
Is there an easy way to do KFold or RepeatedKFold over an Incremental estimator (e.g. SGDRegressor)?
As I understand, IncrementalSearchCV will yield a set of hyperparameters optimized on a single, fixed dataset (controlled by test_size). What I would like to do is have this optimization done on different validation datasets, like is typically done in KFold, something like:
GridSearchCV(Incremental(SGDRegressor(...)), params, cv=KFold(10))
Any help would be really appreciated, thanks!
Issue Analytics
- State:
- Created 5 years ago
- Comments:11 (6 by maintainers)
Top Results From Across the Web
Repeated k-Fold Cross-Validation for Model Evaluation in ...
Repeated k-fold cross-validation provides a way to improve the estimated performance of a machine learning model.
Read more >The importance of k-fold cross-validation for model prediction ...
This article will discuss and analyze the importance of k-fold ... estimating a model without doing any type of cross-validation at all.
Read more >Choice of K in K-fold Cross Validation | by Jeremy Walthers
Jeremy Walthers"Choice of K in K-fold Cross Validation "Kaggle Days San Francisco held in April 2019 gathered over 300 participants to meet, ...
Read more >k-Fold and Repeated k-Fold Cross Validation in Python
One way to address this possible noise is to estimate the model accurary/performance based on running k-fold a number of times and ...
Read more >Why applying cross validation before training a model
Repeated k-fold cross-validation provides a way to improve the estimated performance of a machine learning model.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Your understanding is correct. Right now the strategy is to persist the test dataset once, and use it for all the calls to partial_fit.
One thing I’m not sure about is how allowing multiple CV passes over the data would change the meaning of parameters like
patience
and max_iter. If you specify, say,cv=5
do actually get 5 CV splits, or ifmax_iter
is hit, do you stop?Can I ask: what’s the motivation for multiple CV splits? Have you found it useful on large datasets in practice? See also https://github.com/dask/dask-ml/issues/303
In the past I’ve run into some issues with grid search and incremental. I haven’t dug into details yet though. I’ll put it on my todo list if no one beats me to ti.
On Thu, Nov 1, 2018 at 5:16 AM Team notifications@github.com wrote: