question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parameter searches: folds used in evaluate_candidates aren't consistent across calls

See original GitHub issue

If I use a custom CV iterator with suffle=True, random_state=None, then different folds will be generated for each call to evaluate_candidates().

This isn’t an issue for GridSearchCV or RandomizedSearchCV since these only call evaluate_candidates() once. But this is fundamentally wrong for e.g. Successive Halving #13900, which repeatedly call evaluate_candidates but assumes the folds to be always the same (if resource != n_samples). This is even more of an issue if we implement warm start for SH: estimators would not be warmstarted on the same data.

ping in particular @jnothman , I think this is something you noted before (but can’t remember where, sorry)?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:16 (16 by maintainers)

github_iconTop GitHub Comments

1reaction
thomasjpfancommented, Oct 10, 2019

I would prefer storing the seed. The RandomState.get_state() tuple is quite big (compared to a seed). numpy reference

0reactions
GaelVaroquauxcommented, Oct 30, 2019

That would work too, though I don’t think we should prevent users from using a RandomState instance or None. It’s convenient to just define an instance at the top of your file and pass it down to each call, e.g. random_state=rng.

I would not mind erroring on this with a good error message.

Read more comments on GitHub >

github_iconTop Results From Across the Web

3.2. Tuning the hyper-parameters of an estimator - Scikit-learn
Hyper-parameters are parameters that are not directly learnt within estimators. In scikit-learn they are passed as arguments to the constructor of the ...
Read more >
5 Model Training and Tuning | The caret Package - Github Sites
5.1 Model Training and Parameter Tuning. The caret package has several functions that attempt to streamline the model building and evaluation process.
Read more >
Hyperparameter Tuning with Grid Search and Random Search
If we decide to use cross validation (let's say with 5 folds) this means grid search will have to evaluate 1200 (=240*5) model...
Read more >
Recruiting Metrics Cheat Sheet - LinkedIn Business
Recruiting is a form of selling. You're reaching out to candidates through phone calls and emails to entice them to consider working for...
Read more >
Model evaluation, model selection, and algorithm selection in ...
And in contrast to the repeated holdout method, which we discussed in Part II, test folds in k-fold cross-validation are not overlapping. In ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found