question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add option to choose OOB scorer in RandomForest

See original GitHub issue

Describe the workflow you want to enable

Random forests provide for free an estimate of its out-of-sample performance using the concept of out-of-bag (OOB) predictions. In practice, it works well when

  1. rows are independent
  2. the number of trees is large (500+).

If oob_score=True, currently the following scoring is applied to the OOB predictions:

  • RandomForestRegressor() always returns “r2”, even if the trees are built to minimize Poisson deviance or mean absolute error. R-squared is mainly relevant for the MSE. What do you mean, @lorentzenchr ?
  • RandomForestClassifier() always returns accuracy, even if it does not enjoy many good properties. For instance, if my criterion is “entropy” and I am modelling a binary target, then the corresponding scoring function would be rather log loss than accuracy.

I would like to enable that the user is able to pick a scoring function to evaluate OOB performance.

Describe your proposed solution

  • To RandomForestRegressor(), add option oob_scoring="r2"
  • To RandomForestClassifier(), add option oob_scoring="accuracy"

The value could take any meaningful scoring function from the rich list here, in line with cross-validation scoring: https://scikit-learn.org/stable/modules/model_evaluation.html

Describe alternatives you’ve considered, if relevant

My current solution is to use oob_prediction_ attribute of the fitted model and apply the meaningful scorer on those.

Additional context

No response

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:1
  • Comments:13 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
adrinjalalicommented, Nov 8, 2021

yes, with a normal callable it’d be tricky, but in another project (fairlearn), I’m also thinking of callables (normal scoring funcs) which also have get_metadata_request. It’d be something like ConsumerCallable(score_func).score_requests(sample_weight=True) which would be passed as a normal score func, but it supports metadata routing and stuff.

1reaction
adrinjalalicommented, Nov 8, 2021

we don’t need to pass a scorer, we can simple pass a partial(fbeta_score, beta=.7)

Read more comments on GitHub >

github_iconTop Results From Across the Web

What is Out of Bag (OOB) score in Random Forest?
This blog describes the intuition behind the Out of Bag (OOB) score in Random forest, how it is calculated and where it is...
Read more >
Out-of-Bag (OOB) Score in the Random Forest Algorithm
Out-Of-Bag Score is computed as the number of correctly predicted rows from the out-of-bag sample. Find out about OOB score random forest.
Read more >
OOB Errors for Random Forests - Scikit-learn
The out-of-bag (OOB) error is the average error for each calculated using predictions from the trees that do not contain in their respective...
Read more >
Scikit Learn Random forest classifier: How to produce a plot of ...
The solution you propose also needs to get the oob indices for each tree, because you don't want to compute the score on...
Read more >
Evaluate Random Forest: OOB vs CV - Cross Validated
Therefore, we can say that a tree t is grown by analysing samples Xt plus a number of randomly chosen duplicates drawn from...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found