Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Out of Fold score calculation for Cross validation

See original GitHub issue

Describe the workflow you want to enable

What roughly happens in cross-validation as of now:


# k-fold cross validation
scores = list()
kfold = KFold(n_splits=10, shuffle=True)
# enumerate splits
for train_ix, test_ix in kfold.split(X):
	# get data
	train_X, test_X = X[train_ix], X[test_ix]
	train_y, test_y = y[train_ix], y[test_ix]
	# fit model
	model = KNeighborsClassifier()
	model.fit(train_X, train_y)
	# evaluate model
	yhat = model.predict(test_X)
	acc = accuracy_score(test_y, yhat)
	# store score

        #------------- NOTE THIS ---------------------
	scores.append(acc)
	print('> ', acc)
# summarize model performance

#------------- AND ALSO NOTE THIS ---------------------
mean_s, std_s = mean(scores), std(scores)
print('Mean: %.3f, Standard Deviation: %.3f' % (mean_s, std_s))

Now, this for sure satisfies the purpose. But there should be an option to calculate the OOF score as well. (Described in the solution section)

Describe your proposed solution

What OOF would do:

data_y, data_yhat = list(), list()
kfold = KFold(n_splits=10, shuffle=True)
# enumerate splits
for train_ix, test_ix in kfold.split(X):
	# get data
	train_X, test_X = X[train_ix], X[test_ix]
	train_y, test_y = y[train_ix], y[test_ix]
	# fit model
	model = KNeighborsClassifier()
	model.fit(train_X, train_y)
	# make predictions
	yhat = model.predict(test_X)
	# store
        #------------- NOTE THIS ---------------------
	data_y.extend(test_y)
	data_yhat.extend(yhat)
# evaluate the model

#------------- AND ALSO NOTE THIS ---------------------
acc = accuracy_score(data_y, data_yhat)
print('Accuracy: %.3f' % (acc))

Describe alternatives you’ve considered, if relevant

No response

Additional context

An article on OOF predictions and score - https://machinelearningmastery.com/out-of-fold-predictions-in-machine-learning/

Issue Analytics

State:
Created a year ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

glemaitrecommented, Mar 28, 2022

I am -1 for adding another parameter to cross_validate

@thomasjpfan I might be interesting to be able to return the predictions in cross_validate. Indeed, it might be useful in plotting displays if we provide the cv_results in some cases. However, it should not be returned by default and should as well come with a big warning about when to use it.

0reactions

aadarshsingh191198commented, Apr 6, 2022

@thomasjpfan @glemaitre

Top Results From Across the Web

How to Use Out-of-Fold Predictions in Machine Learning

The k-fold cross-validation procedure involves splitting a training dataset into k groups, then using each of the k groups of examples on a...

3.1. Cross-validation: evaluating estimator performance

KFold divides all the samples in k groups of samples, called folds (if k = n , this is equivalent to the Leave...

Cross-Validation in Machine Learning: How to Do It Right

In general, it is always better to use k-Fold technique instead of hold-out. In a head to head, comparison k-Fold gives a more...

An Easy Guide to K-Fold Cross-Validation - Statology

K-fold cross-validation uses the following approach to evaluate a model: Step 1: Randomly divide a dataset into k groups, or “folds”, of roughly ......

Cross-validation Tutorial

Specifically, we are going to predict participants' ACT scores from their ... Additionally, leave-one-out cross-validation is when the number of folds is ...