Add option to choose OOB scorer in RandomForest
See original GitHub issueDescribe the workflow you want to enable
Random forests provide for free an estimate of its out-of-sample performance using the concept of out-of-bag (OOB) predictions. In practice, it works well when
- rows are independent
- the number of trees is large (500+).
If oob_score=True
, currently the following scoring is applied to the OOB predictions:
RandomForestRegressor()
always returns “r2”, even if the trees are built to minimize Poisson deviance or mean absolute error. R-squared is mainly relevant for the MSE. What do you mean, @lorentzenchr ?RandomForestClassifier()
always returns accuracy, even if it does not enjoy many good properties. For instance, if my criterion is “entropy” and I am modelling a binary target, then the corresponding scoring function would be rather log loss than accuracy.
I would like to enable that the user is able to pick a scoring function to evaluate OOB performance.
Describe your proposed solution
- To
RandomForestRegressor()
, add optionoob_scoring="r2"
- To
RandomForestClassifier()
, add optionoob_scoring="accuracy"
The value could take any meaningful scoring function from the rich list here, in line with cross-validation scoring: https://scikit-learn.org/stable/modules/model_evaluation.html
Describe alternatives you’ve considered, if relevant
My current solution is to use oob_prediction_
attribute of the fitted model and apply the meaningful scorer on those.
Additional context
No response
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:13 (11 by maintainers)
Top Results From Across the Web
What is Out of Bag (OOB) score in Random Forest?
This blog describes the intuition behind the Out of Bag (OOB) score in Random forest, how it is calculated and where it is...
Read more >Out-of-Bag (OOB) Score in the Random Forest Algorithm
Out-Of-Bag Score is computed as the number of correctly predicted rows from the out-of-bag sample. Find out about OOB score random forest.
Read more >OOB Errors for Random Forests - Scikit-learn
The out-of-bag (OOB) error is the average error for each calculated using predictions from the trees that do not contain in their respective...
Read more >Scikit Learn Random forest classifier: How to produce a plot of ...
The solution you propose also needs to get the oob indices for each tree, because you don't want to compute the score on...
Read more >Evaluate Random Forest: OOB vs CV - Cross Validated
Therefore, we can say that a tree t is grown by analysing samples Xt plus a number of randomly chosen duplicates drawn from...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
yes, with a normal callable it’d be tricky, but in another project (fairlearn), I’m also thinking of callables (normal scoring funcs) which also have
get_metadata_request
. It’d be something likeConsumerCallable(score_func).score_requests(sample_weight=True)
which would be passed as a normal score func, but it supports metadata routing and stuff.we don’t need to pass a scorer, we can simple pass a
partial(fbeta_score, beta=.7)