Out of Fold score calculation for Cross validation
See original GitHub issueDescribe the workflow you want to enable
What roughly happens in cross-validation as of now:
# k-fold cross validation
scores = list()
kfold = KFold(n_splits=10, shuffle=True)
# enumerate splits
for train_ix, test_ix in kfold.split(X):
# get data
train_X, test_X = X[train_ix], X[test_ix]
train_y, test_y = y[train_ix], y[test_ix]
# fit model
model = KNeighborsClassifier()
model.fit(train_X, train_y)
# evaluate model
yhat = model.predict(test_X)
acc = accuracy_score(test_y, yhat)
# store score
#------------- NOTE THIS ---------------------
scores.append(acc)
print('> ', acc)
# summarize model performance
#------------- AND ALSO NOTE THIS ---------------------
mean_s, std_s = mean(scores), std(scores)
print('Mean: %.3f, Standard Deviation: %.3f' % (mean_s, std_s))
Now, this for sure satisfies the purpose. But there should be an option to calculate the OOF score as well. (Described in the solution section)
Describe your proposed solution
What OOF would do:
data_y, data_yhat = list(), list()
kfold = KFold(n_splits=10, shuffle=True)
# enumerate splits
for train_ix, test_ix in kfold.split(X):
# get data
train_X, test_X = X[train_ix], X[test_ix]
train_y, test_y = y[train_ix], y[test_ix]
# fit model
model = KNeighborsClassifier()
model.fit(train_X, train_y)
# make predictions
yhat = model.predict(test_X)
# store
#------------- NOTE THIS ---------------------
data_y.extend(test_y)
data_yhat.extend(yhat)
# evaluate the model
#------------- AND ALSO NOTE THIS ---------------------
acc = accuracy_score(data_y, data_yhat)
print('Accuracy: %.3f' % (acc))
Describe alternatives you’ve considered, if relevant
No response
Additional context
An article on OOF predictions and score - https://machinelearningmastery.com/out-of-fold-predictions-in-machine-learning/
Issue Analytics
- State:
- Created a year ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
How to Use Out-of-Fold Predictions in Machine Learning
The k-fold cross-validation procedure involves splitting a training dataset into k groups, then using each of the k groups of examples on a...
Read more >3.1. Cross-validation: evaluating estimator performance
KFold divides all the samples in k groups of samples, called folds (if k = n , this is equivalent to the Leave...
Read more >Cross-Validation in Machine Learning: How to Do It Right
In general, it is always better to use k-Fold technique instead of hold-out. In a head to head, comparison k-Fold gives a more...
Read more >An Easy Guide to K-Fold Cross-Validation - Statology
K-fold cross-validation uses the following approach to evaluate a model: Step 1: Randomly divide a dataset into k groups, or “folds”, of roughly ......
Read more >Cross-validation Tutorial
Specifically, we are going to predict participants' ACT scores from their ... Additionally, leave-one-out cross-validation is when the number of folds is ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@thomasjpfan I might be interesting to be able to return the predictions in
cross_validate
. Indeed, it might be useful in plotting displays if we provide thecv_results
in some cases. However, it should not be returned by default and should as well come with a big warning about when to use it.@thomasjpfan @glemaitre