Store the OOB Loss for `GradientBoostingClassifier`
See original GitHub issueDescribe the workflow you want to enable
Currently the only OOB-related performance metric we store on GradientBoostingClassifier
is oob_improvement_
, which is an array of OOB loss decreases per iteration. However, it would also be useful to track the actual OOB loss values for each iteration. This can be used as an estimate of the generalization error, and might bypass the need for cross validation in some cases. This would also help it integrate into my #23391 framework.
Describe your proposed solution
I propose we add a new attribute: oob_score_
(or alternatively oob_loss_
) to GradientBoostingClassifier
. This would only be set in cases where subsample < 1
. It would be updated in each iteration. Currently we are already calculating this, we just throw it away:
https://github.com/scikit-learn/scikit-learn/blob/32f9deaaf27c7ae56898222be9d820ba0fd1054f/sklearn/ensemble/_gb.py#L758-L768
Describe alternatives you’ve considered, if relevant
You might think that the cumsum of the oob_improvement_
would give us the loss values, and this is almost true, except for the fact that we need the OOB loss for the first iteration, which isn’t stored anywhere. So this doesn’t solve the issue.
Additional context
No response
Issue Analytics
- State:
- Created a year ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
I am fine with storing:
/take