Bad nested CV example? It looks like each CV loop does the same exact split on the data
See original GitHub issueI think this example of nested cross validation (from the main site) isn’t implementing nested cross validation properly. The example uses the same exact data and does the same exact splits for each level of nested CV. This is inconsistent with my understanding of how nested CV should be implemented (see this question of mine in cross validated and the top answers).
# Loop for each trial
for i in range(NUM_TRIALS):
# Choose cross-validation techniques for the inner and outer loops,
# independently of the dataset.
# E.g "LabelKFold", "LeaveOneOut", "LeaveOneLabelOut", etc.
inner_cv = KFold(n_splits=4, shuffle=True, random_state=i)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=i)
# Non_nested parameter search and scoring
clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=inner_cv)
clf.fit(X_iris, y_iris) ## <<<-- LINE A
non_nested_scores[i] = clf.best_score_
# Nested CV with parameter optimization
nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv) ## <<<-- LINE B
nested_scores[i] = nested_score.mean()
note that the two lines above and that I called A
and B
set up the same K
folds (i.e. same random state i
) on the same exact data (X_iris
and y_iris
).
clf.fit(X_iris, y_iris)
nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv)
This implementation isn’t consistent with a typical nested CV implementation, which would instead do a full loop of the inner CV in the data within each fold of the outer CV loop:
Issue Analytics
- State:
- Created 6 years ago
- Comments:12 (8 by maintainers)
Top Results From Across the Web
A step by step guide to Nested Cross-Validation
Nested cross-validation has a double loop, an outer loop and an inner loop. This article is a step by step guide to nested...
Read more >Nested cross validation for model selection
From what I read online, nested CV works as follows: There is the inner CV loop, where we may conduct a grid search...
Read more >Confusing example of nested cross validation in scikit-learn
In nested CV, to the best of my understanding, the idea is that the inner loop will do the hyper-parameter search on a...
Read more >A gentle introduction to nested cross-validation - Ploomber
We are nesting two cross-validation loops. Look at the third one as a reference. We split again once we do the initial split...
Read more >Nested Cross-Validation for Machine Learning with Python
Each of the k folds is given an opportunity to be used as a held ... Nested cross-validation with k=10 folds in the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Phenomenal, thank you. This clarifies it:
I didn’t know
cross_val_score
would re-fit as opposed to just evaluateclf
that was already fit, on the CV provided (sorry I always useGridSearchCV
here so wasn’t sure about it). 👍I think we should just construct the grid search again in the cross_val_score call.