question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reconsider default of `max_features` of RandomForestRegressor

See original GitHub issue

In https://github.com/scikit-learn/scikit-learn/issues/7254, there was a long discussion on max_features defaults for random forests. As a consequence, the default “auto” was changed to “sqrt” for RandomForestClassifier, but unfortunately not for RandomForestRegressor. I would like to reconsider this decision.

What to change?

The default of RandomForestRegressor’s max_features = "auto" should point to m/3 or sqrt(m), where m is the number of features.

Why?

  1. Good defaults are essential for random forests. The fact that random forests do well even without hyperparameter tuning is one of their only advantages over boosted trees.

  2. Every implementation in R and also h2o use sqrt(m) or m/3 as default. R’s ranger package uses sqrt(m) for both regression and classification. https://github.com/imbs-hl/ranger

  3. Column subsampling per split is the main source of randomness, leading to less correlated trees. The current default removes this effect. Strictly speaking, the current default does not fit a proper random forest but rather a bagged tree. My experience shows that random forests perform better than bagged trees in the majority of the cases.

  4. Training time is proportional to max_features. I.e. one could easily run 500 trees instead of 100 with a better default.

Note: I am not talking about defaults for completely randomized trees, just about proper random forests.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:34 (31 by maintainers)

github_iconTop GitHub Comments

4reactions
lorentzenchrcommented, Jun 25, 2021

Curious about the effects of max_features in regression problems, I did an analysis (thanks @thomasjpfan for your sk_encoder_cv repo!), see https://github.com/lorentzenchr/notebooks/blob/master/random_forests_max_features.ipynb.

5-fold CV: MSE and uncertainty (std)

Smaller is better. image

Fit times

Smaller is better image

Full table in details.

dataset n_samples n_features max_features fit_time mse mse_std
ames 1460 79 p/3 3.006385 8.283968e+08 2.924086e+08
ames 1460 79 sqrt(p) 1.569389 8.558911e+08 2.757982e+08
ames 1460 79 0.9p 6.063473 8.506720e+08 2.850547e+08
ames 1460 79 p 6.822197 8.518196e+08 2.753024e+08
taxi 581835 18 p/3 849.588564 3.032839e+00 2.711375e-01
taxi 581835 18 sqrt(p) 589.695271 3.243551e+00 3.208481e-01
taxi 581835 18 0.9p 1569.435796 2.825315e+00 2.582628e-01
taxi 581835 18 p 1911.511817 2.845630e+00 2.645234e-01
Allstate_Claims_Severity 188318 130 p/3 1102.842597 3.831934e+06 1.726923e+05
Allstate_Claims_Severity 188318 130 sqrt(p) 450.005303 3.848965e+06 2.098695e+05
Allstate_Claims_Severity 188318 130 0.9p 2854.968361 3.903403e+06 1.339955e+05
Allstate_Claims_Severity 188318 130 p 2826.416086 3.939322e+06 1.314348e+05
medical_charges_nominal 163065 11 p/3 192.912416 1.250430e+06 1.337569e+05
medical_charges_nominal 163065 11 sqrt(p) 193.330037 1.250430e+06 1.337569e+05
medical_charges_nominal 163065 11 0.9p 551.927751 1.174209e+06 8.158113e+04
medical_charges_nominal 163065 11 p 678.158142 1.203875e+06 8.149356e+04
Bike_Sharing_Demand 17379 12 p/3 9.231804 1.434250e-01 5.949379e-03
Bike_Sharing_Demand 17379 12 sqrt(p) 10.034992 1.761788e-01 7.120561e-03
Bike_Sharing_Demand 17379 12 0.9p 19.286327 1.186538e-01 7.947182e-03
Bike_Sharing_Demand 17379 12 p 22.553369 1.195608e-01 7.611101e-03
Brazilian_houses 10692 12 p/3 7.409770 3.807240e+07 4.088876e+07
Brazilian_houses 10692 12 sqrt(p) 6.516470 3.862123e+07 4.121148e+07
Brazilian_houses 10692 12 0.9p 16.011116 3.431596e+07 4.073237e+07
Brazilian_houses 10692 12 p 18.663614 3.362349e+07 3.993881e+07
delays_zurich_transport 27327 17 p/3 24.958200 1.145454e+04 1.076261e+03
delays_zurich_transport 27327 17 sqrt(p) 20.981417 1.144623e+04 1.089244e+03
delays_zurich_transport 27327 17 0.9p 65.121171 1.151696e+04 9.790258e+02
delays_zurich_transport 27327 17 p 72.789688 1.162809e+04 9.357107e+02
nyc-taxi-green-dec-2016 581835 14 p/3 627.841052 3.530295e+00 2.065145e-01
nyc-taxi-green-dec-2016 581835 14 sqrt(p) 546.469134 3.618843e+00 2.251485e-01
nyc-taxi-green-dec-2016 581835 14 0.9p 1314.987765 3.460904e+00 2.716307e-01
nyc-taxi-green-dec-2016 581835 14 p 1480.242792 3.500681e+00 2.299894e-01
black_friday 166821 9 p/3 63.721927 1.396359e+07 1.152302e+05
black_friday 166821 9 sqrt(p) 64.560735 1.396359e+07 1.152302e+05
black_friday 166821 9 0.9p 117.040547 1.413905e+07 1.314728e+05
black_friday 166821 9 p 127.570616 1.417210e+07 1.324711e+05
colleges 7063 49 p/3 20.138269 2.398931e-02 1.459669e-03
colleges 7063 49 sqrt(p) 11.033385 2.390907e-02 1.477293e-03
colleges 7063 49 0.9p 51.833524 2.516671e-02 1.865950e-03
colleges 7063 49 p 1.128208 NaN NaN
la_crimes 1468825 25 p/3 3931.144400 1.749243e+02 3.758375e-01
la_crimes 1468825 25 sqrt(p) 2992.126391 1.754937e+02 3.430521e-01
la_crimes 1468825 25 0.9p 11111.923781 1.770801e+02 3.952217e-01
la_crimes 1468825 25 p 11693.530548 1.779913e+02 2.790011e-01
particulate-matter-ukair-2017 394299 9 p/3 305.517630 1.999397e+01 2.262459e+00
particulate-matter-ukair-2017 394299 9 sqrt(p) 304.227987 1.999397e+01 2.262459e+00
particulate-matter-ukair-2017 394299 9 0.9p 673.704647 2.266109e+01 2.297767e+00
particulate-matter-ukair-2017 394299 9 p 823.500997 2.310339e+01 2.274735e+00
diamonds 53940 6 p/3 23.660542 1.319924e-02 1.448878e-04
diamonds 53940 6 sqrt(p) 22.460638 1.319924e-02 1.448878e-04
diamonds 53940 6 0.9p 44.888202 1.204376e-02 1.839350e-04
diamonds 53940 6 p 51.263461 1.223234e-02 1.880511e-04
3reactions
NicolasHugcommented, Jun 29, 2021

we already allow floats so that’s equivalent to max_features=.33 and max_features=.9:

If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.ensemble.RandomForestRegressor
If None or 1.0, then max_features=n_features . Note. The default of 1.0 is equivalent to bagged trees and more randomness can be achieved...
Read more >
Why is the default value for max_features in ... - Stack Overflow
In RandomForestClassifier the default value for max_features is sqrt(n_features) and in RandomForestRegressor it is n_features ...
Read more >
How many features does (Random Forest) need for the trees?
In general, it's a good rule of thumb to use the default values: max_features=sqrt(n_features) for classification and ...
Read more >
Random Forest Regression - Towards Data Science
The default value is MSE. max_depth — this sets the maximum possible depth of each tree; max_features — the maximum number of features...
Read more >
Random Forest Parameter Tuning - Analytics Vidhya
Parameters / levers to tune Random Forests · 1.a. max_features: These are the maximum number of features Random Forest is allowed to try...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found