question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Store the OOB Loss for `GradientBoostingClassifier`

See original GitHub issue

Describe the workflow you want to enable

Currently the only OOB-related performance metric we store on GradientBoostingClassifier is oob_improvement_, which is an array of OOB loss decreases per iteration. However, it would also be useful to track the actual OOB loss values for each iteration. This can be used as an estimate of the generalization error, and might bypass the need for cross validation in some cases. This would also help it integrate into my #23391 framework.

Describe your proposed solution

I propose we add a new attribute: oob_score_ (or alternatively oob_loss_) to GradientBoostingClassifier. This would only be set in cases where subsample < 1. It would be updated in each iteration. Currently we are already calculating this, we just throw it away: https://github.com/scikit-learn/scikit-learn/blob/32f9deaaf27c7ae56898222be9d820ba0fd1054f/sklearn/ensemble/_gb.py#L758-L768

Describe alternatives you’ve considered, if relevant

You might think that the cumsum of the oob_improvement_ would give us the loss values, and this is almost true, except for the fact that we need the OOB loss for the first iteration, which isn’t stored anywhere. So this doesn’t solve the issue.

Additional context

No response

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
ogriselcommented, Jun 2, 2022

I am fine with storing:

oob_scores_  # with the full history
oob_score_  # as a convenience for the last element of the previous array.
0reactions
awinmlcommented, Nov 3, 2022

/take

Read more comments on GitHub >

github_iconTop Results From Across the Web

Gradient Boosting Out-of-Bag estimates
The OOB estimator is a pessimistic estimator of the true test loss, but remains a fairly good approximation for a small number of...
Read more >
Scikit Learn Random forest classifier: How to produce a ...
In order to see how many trees are necessary in my forest, I'd like to plot the OOB error as the number of...
Read more >
Understanding Gradient Boosting Machines
The loss function is a measure indicating how good are model's coefficients are at fitting the underlying data. A logical understanding of ...
Read more >
Out-of-bag error estimate for boosting? - Cross Validated
I do not know how this idea works precisely, but from what I gathered, the oob sample for the current tree is used...
Read more >
Gradient Boosting | Hyperparameter Tuning Python
It refers to the loss function to be minimized in each split. ... IDcol]] gbm0 = GradientBoostingClassifier(random_state=10) modelfit(gbm0, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found