question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Out of Fold score calculation for Cross validation

See original GitHub issue

Describe the workflow you want to enable

What roughly happens in cross-validation as of now:


# k-fold cross validation
scores = list()
kfold = KFold(n_splits=10, shuffle=True)
# enumerate splits
for train_ix, test_ix in kfold.split(X):
	# get data
	train_X, test_X = X[train_ix], X[test_ix]
	train_y, test_y = y[train_ix], y[test_ix]
	# fit model
	model = KNeighborsClassifier()
	model.fit(train_X, train_y)
	# evaluate model
	yhat = model.predict(test_X)
	acc = accuracy_score(test_y, yhat)
	# store score

        #------------- NOTE THIS ---------------------
	scores.append(acc)
	print('> ', acc)
# summarize model performance

#------------- AND ALSO NOTE THIS ---------------------
mean_s, std_s = mean(scores), std(scores)
print('Mean: %.3f, Standard Deviation: %.3f' % (mean_s, std_s))

Now, this for sure satisfies the purpose. But there should be an option to calculate the OOF score as well. (Described in the solution section)

Describe your proposed solution

What OOF would do:

data_y, data_yhat = list(), list()
kfold = KFold(n_splits=10, shuffle=True)
# enumerate splits
for train_ix, test_ix in kfold.split(X):
	# get data
	train_X, test_X = X[train_ix], X[test_ix]
	train_y, test_y = y[train_ix], y[test_ix]
	# fit model
	model = KNeighborsClassifier()
	model.fit(train_X, train_y)
	# make predictions
	yhat = model.predict(test_X)
	# store
        #------------- NOTE THIS ---------------------
	data_y.extend(test_y)
	data_yhat.extend(yhat)
# evaluate the model

#------------- AND ALSO NOTE THIS ---------------------
acc = accuracy_score(data_y, data_yhat)
print('Accuracy: %.3f' % (acc))

Describe alternatives you’ve considered, if relevant

No response

Additional context

An article on OOF predictions and score - https://machinelearningmastery.com/out-of-fold-predictions-in-machine-learning/

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
glemaitrecommented, Mar 28, 2022

I am -1 for adding another parameter to cross_validate

@thomasjpfan I might be interesting to be able to return the predictions in cross_validate. Indeed, it might be useful in plotting displays if we provide the cv_results in some cases. However, it should not be returned by default and should as well come with a big warning about when to use it.

0reactions
aadarshsingh191198commented, Apr 6, 2022
Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Use Out-of-Fold Predictions in Machine Learning
The k-fold cross-validation procedure involves splitting a training dataset into k groups, then using each of the k groups of examples on a...
Read more >
3.1. Cross-validation: evaluating estimator performance
KFold divides all the samples in k groups of samples, called folds (if k = n , this is equivalent to the Leave...
Read more >
Cross-Validation in Machine Learning: How to Do It Right
In general, it is always better to use k-Fold technique instead of hold-out. In a head to head, comparison k-Fold gives a more...
Read more >
An Easy Guide to K-Fold Cross-Validation - Statology
K-fold cross-validation uses the following approach to evaluate a model: Step 1: Randomly divide a dataset into k groups, or “folds”, of roughly ......
Read more >
Cross-validation Tutorial
Specifically, we are going to predict participants' ACT scores from their ... Additionally, leave-one-out cross-validation is when the number of folds is ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found