question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Make explicit that cross_val* functions uses unshuffled KFold CV

See original GitHub issue

Describe the workflow you want to enable

Many algorithms that use cross-validation support supplying a simple integer value to the cv parameter for simplicity. This will then create a KFold or StraifiedKFold with the appropriate number of folds. Many estimators and folding strategies also support the random_state parameter for reproducibility, however this can be broken if your folding strategy does not also have a random_state set.

Describe your proposed solution

Allow estimators to pass their random_state object to check_cv so that if a use sets random_state, everything is completely reproducible if an integer was used for cv.

Describe alternatives you’ve considered, if relevant

Additional context

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
NicolasHugcommented, Mar 25, 2021

Perhaps we should extend the current docs (addition in italics):

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used. These splitters are instanciated with shuffle=False so the splits will be the same across calls.

0reactions
glemaitrecommented, Apr 9, 2021

This issue has been addressed in #19776

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Configure k-Fold Cross-Validation
First, let's define a synthetic classification dataset that we can use as the basis of this tutorial. The make_classification() function can be ...
Read more >
Complete guide to Python's cross-validation with examples
We will learn: What is KFold, ShuffledKfold and StratifiedKfold and see how they differ; How to cross validate your model without KFold using...
Read more >
Cross-Validation in Machine Learning: How to Do It Right
Cross-validation is a technique for evaluating a machine learning model and testing its performance. CV is commonly used in applied ML tasks ...
Read more >
K-Fold Cross Validation - Python Example - Data Analytics
The k-fold cross-validation technique can be implemented easily using Python with scikit learn (Sklearn) package which provides an easy way ...
Read more >
3.1. Cross-validation: evaluating estimator performance
Learning the parameters of a prediction function and testing it on the same data is a methodological mistake: a model that would just...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found