Parameter searches: folds used in evaluate_candidates aren't consistent across calls
See original GitHub issueIf I use a custom CV iterator with suffle=True, random_state=None
, then different folds will be generated for each call to evaluate_candidates()
.
This isn’t an issue for GridSearchCV or RandomizedSearchCV since these only call evaluate_candidates()
once. But this is fundamentally wrong for e.g. Successive Halving #13900, which repeatedly call evaluate_candidates
but assumes the folds to be always the same (if resource != n_samples
). This is even more of an issue if we implement warm start for SH: estimators would not be warmstarted on the same data.
ping in particular @jnothman , I think this is something you noted before (but can’t remember where, sorry)?
Issue Analytics
- State:
- Created 4 years ago
- Comments:16 (16 by maintainers)
Top Results From Across the Web
3.2. Tuning the hyper-parameters of an estimator - Scikit-learn
Hyper-parameters are parameters that are not directly learnt within estimators. In scikit-learn they are passed as arguments to the constructor of the ...
Read more >5 Model Training and Tuning | The caret Package - Github Sites
5.1 Model Training and Parameter Tuning. The caret package has several functions that attempt to streamline the model building and evaluation process.
Read more >Hyperparameter Tuning with Grid Search and Random Search
If we decide to use cross validation (let's say with 5 folds) this means grid search will have to evaluate 1200 (=240*5) model...
Read more >Recruiting Metrics Cheat Sheet - LinkedIn Business
Recruiting is a form of selling. You're reaching out to candidates through phone calls and emails to entice them to consider working for...
Read more >Model evaluation, model selection, and algorithm selection in ...
And in contrast to the repeated holdout method, which we discussed in Part II, test folds in k-fold cross-validation are not overlapping. In ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I would prefer storing the seed. The
RandomState.get_state()
tuple is quite big (compared to a seed). numpy referenceI would not mind erroring on this with a good error message.