question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bad nested CV example? It looks like each CV loop does the same exact split on the data

See original GitHub issue

I think this example of nested cross validation (from the main site) isn’t implementing nested cross validation properly. The example uses the same exact data and does the same exact splits for each level of nested CV. This is inconsistent with my understanding of how nested CV should be implemented (see this question of mine in cross validated and the top answers).

# Loop for each trial
for i in range(NUM_TRIALS):

    # Choose cross-validation techniques for the inner and outer loops,
    # independently of the dataset.
    # E.g "LabelKFold", "LeaveOneOut", "LeaveOneLabelOut", etc.
    inner_cv = KFold(n_splits=4, shuffle=True, random_state=i)
    outer_cv = KFold(n_splits=4, shuffle=True, random_state=i)

    # Non_nested parameter search and scoring
    clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=inner_cv)

    clf.fit(X_iris, y_iris) ## <<<-- LINE A
    non_nested_scores[i] = clf.best_score_

   # Nested CV with parameter optimization
    nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv)  ## <<<-- LINE B
    nested_scores[i] = nested_score.mean()

note that the two lines above and that I called A and B set up the same K folds (i.e. same random state i) on the same exact data (X_iris and y_iris).

  • clf.fit(X_iris, y_iris)
  • nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv)

This implementation isn’t consistent with a typical nested CV implementation, which would instead do a full loop of the inner CV in the data within each fold of the outer CV loop:

screen shot 2017-08-01 at 1 37 46 pm

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:12 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
amelio-vazquez-reinacommented, Aug 1, 2017

Phenomenal, thank you. This clarifies it:

cross_val_score fits the model again, for each split of the data.

I didn’t know cross_val_score would re-fit as opposed to just evaluate clf that was already fit, on the CV provided (sorry I always use GridSearchCV here so wasn’t sure about it). 👍

0reactions
jnothmancommented, Aug 1, 2017

I think we should just construct the grid search again in the cross_val_score call.

Read more comments on GitHub >

github_iconTop Results From Across the Web

A step by step guide to Nested Cross-Validation
Nested cross-validation has a double loop, an outer loop and an inner loop. This article is a step by step guide to nested...
Read more >
Nested cross validation for model selection
From what I read online, nested CV works as follows: There is the inner CV loop, where we may conduct a grid search...
Read more >
Confusing example of nested cross validation in scikit-learn
In nested CV, to the best of my understanding, the idea is that the inner loop will do the hyper-parameter search on a...
Read more >
A gentle introduction to nested cross-validation - Ploomber
We are nesting two cross-validation loops. Look at the third one as a reference. We split again once we do the initial split...
Read more >
Nested Cross-Validation for Machine Learning with Python
Each of the k folds is given an opportunity to be used as a held ... Nested cross-validation with k=10 folds in the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found