Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Make explicit that cross_val* functions uses unshuffled KFold CV

See original GitHub issue

Describe the workflow you want to enable

Many algorithms that use cross-validation support supplying a simple integer value to the cv parameter for simplicity. This will then create a KFold or StraifiedKFold with the appropriate number of folds. Many estimators and folding strategies also support the random_state parameter for reproducibility, however this can be broken if your folding strategy does not also have a random_state set.

Describe your proposed solution

Allow estimators to pass their random_state object to check_cv so that if a use sets random_state, everything is completely reproducible if an integer was used for cv.

Describe alternatives you’ve considered, if relevant

Additional context

Issue Analytics

State:
Created 2 years ago
Comments:6 (5 by maintainers)

Top GitHub Comments

1reaction

NicolasHugcommented, Mar 25, 2021

Perhaps we should extend the current docs (addition in italics):

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used. These splitters are instanciated with shuffle=False so the splits will be the same across calls.

0reactions

glemaitrecommented, Apr 9, 2021

This issue has been addressed in #19776