Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reconsider default of `max_features` of RandomForestRegressor

See original GitHub issue

In https://github.com/scikit-learn/scikit-learn/issues/7254, there was a long discussion on max_features defaults for random forests. As a consequence, the default “auto” was changed to “sqrt” for RandomForestClassifier, but unfortunately not for RandomForestRegressor. I would like to reconsider this decision.

What to change?

The default of RandomForestRegressor’s max_features = "auto" should point to m/3 or sqrt(m), where m is the number of features.

Why?

Good defaults are essential for random forests. The fact that random forests do well even without hyperparameter tuning is one of their only advantages over boosted trees.
Every implementation in R and also h2o use sqrt(m) or m/3 as default. R’s ranger package uses sqrt(m) for both regression and classification. https://github.com/imbs-hl/ranger
Column subsampling per split is the main source of randomness, leading to less correlated trees. The current default removes this effect. Strictly speaking, the current default does not fit a proper random forest but rather a bagged tree. My experience shows that random forests perform better than bagged trees in the majority of the cases.
Training time is proportional to max_features. I.e. one could easily run 500 trees instead of 100 with a better default.

Note: I am not talking about defaults for completely randomized trees, just about proper random forests.

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:34 (31 by maintainers)

Top GitHub Comments

4reactions

lorentzenchrcommented, Jun 25, 2021

Curious about the effects of max_features in regression problems, I did an analysis (thanks @thomasjpfan for your sk_encoder_cv repo!), see https://github.com/lorentzenchr/notebooks/blob/master/random_forests_max_features.ipynb.

5-fold CV: MSE and uncertainty (std)

Smaller is better.

Fit times

Smaller is better

Full table in details.

dataset	n_samples	n_features	max_features	fit_time	mse	mse_std
ames	1460	79	p/3	3.006385	8.283968e+08	2.924086e+08
ames	1460	79	sqrt(p)	1.569389	8.558911e+08	2.757982e+08
ames	1460	79	0.9p	6.063473	8.506720e+08	2.850547e+08
ames	1460	79	p	6.822197	8.518196e+08	2.753024e+08
taxi	581835	18	p/3	849.588564	3.032839e+00	2.711375e-01
taxi	581835	18	sqrt(p)	589.695271	3.243551e+00	3.208481e-01
taxi	581835	18	0.9p	1569.435796	2.825315e+00	2.582628e-01
taxi	581835	18	p	1911.511817	2.845630e+00	2.645234e-01
Allstate_Claims_Severity	188318	130	p/3	1102.842597	3.831934e+06	1.726923e+05
Allstate_Claims_Severity	188318	130	sqrt(p)	450.005303	3.848965e+06	2.098695e+05
Allstate_Claims_Severity	188318	130	0.9p	2854.968361	3.903403e+06	1.339955e+05
Allstate_Claims_Severity	188318	130	p	2826.416086	3.939322e+06	1.314348e+05
medical_charges_nominal	163065	11	p/3	192.912416	1.250430e+06	1.337569e+05
medical_charges_nominal	163065	11	sqrt(p)	193.330037	1.250430e+06	1.337569e+05
medical_charges_nominal	163065	11	0.9p	551.927751	1.174209e+06	8.158113e+04
medical_charges_nominal	163065	11	p	678.158142	1.203875e+06	8.149356e+04
Bike_Sharing_Demand	17379	12	p/3	9.231804	1.434250e-01	5.949379e-03
Bike_Sharing_Demand	17379	12	sqrt(p)	10.034992	1.761788e-01	7.120561e-03
Bike_Sharing_Demand	17379	12	0.9p	19.286327	1.186538e-01	7.947182e-03
Bike_Sharing_Demand	17379	12	p	22.553369	1.195608e-01	7.611101e-03
Brazilian_houses	10692	12	p/3	7.409770	3.807240e+07	4.088876e+07
Brazilian_houses	10692	12	sqrt(p)	6.516470	3.862123e+07	4.121148e+07
Brazilian_houses	10692	12	0.9p	16.011116	3.431596e+07	4.073237e+07
Brazilian_houses	10692	12	p	18.663614	3.362349e+07	3.993881e+07
delays_zurich_transport	27327	17	p/3	24.958200	1.145454e+04	1.076261e+03
delays_zurich_transport	27327	17	sqrt(p)	20.981417	1.144623e+04	1.089244e+03
delays_zurich_transport	27327	17	0.9p	65.121171	1.151696e+04	9.790258e+02
delays_zurich_transport	27327	17	p	72.789688	1.162809e+04	9.357107e+02
nyc-taxi-green-dec-2016	581835	14	p/3	627.841052	3.530295e+00	2.065145e-01
nyc-taxi-green-dec-2016	581835	14	sqrt(p)	546.469134	3.618843e+00	2.251485e-01
nyc-taxi-green-dec-2016	581835	14	0.9p	1314.987765	3.460904e+00	2.716307e-01
nyc-taxi-green-dec-2016	581835	14	p	1480.242792	3.500681e+00	2.299894e-01
black_friday	166821	9	p/3	63.721927	1.396359e+07	1.152302e+05
black_friday	166821	9	sqrt(p)	64.560735	1.396359e+07	1.152302e+05
black_friday	166821	9	0.9p	117.040547	1.413905e+07	1.314728e+05
black_friday	166821	9	p	127.570616	1.417210e+07	1.324711e+05
colleges	7063	49	p/3	20.138269	2.398931e-02	1.459669e-03
colleges	7063	49	sqrt(p)	11.033385	2.390907e-02	1.477293e-03
colleges	7063	49	0.9p	51.833524	2.516671e-02	1.865950e-03
colleges	7063	49	p	1.128208	NaN	NaN
la_crimes	1468825	25	p/3	3931.144400	1.749243e+02	3.758375e-01
la_crimes	1468825	25	sqrt(p)	2992.126391	1.754937e+02	3.430521e-01
la_crimes	1468825	25	0.9p	11111.923781	1.770801e+02	3.952217e-01
la_crimes	1468825	25	p	11693.530548	1.779913e+02	2.790011e-01
particulate-matter-ukair-2017	394299	9	p/3	305.517630	1.999397e+01	2.262459e+00
particulate-matter-ukair-2017	394299	9	sqrt(p)	304.227987	1.999397e+01	2.262459e+00
particulate-matter-ukair-2017	394299	9	0.9p	673.704647	2.266109e+01	2.297767e+00
particulate-matter-ukair-2017	394299	9	p	823.500997	2.310339e+01	2.274735e+00
diamonds	53940	6	p/3	23.660542	1.319924e-02	1.448878e-04
diamonds	53940	6	sqrt(p)	22.460638	1.319924e-02	1.448878e-04
diamonds	53940	6	0.9p	44.888202	1.204376e-02	1.839350e-04
diamonds	53940	6	p	51.263461	1.223234e-02	1.880511e-04

3reactions

NicolasHugcommented, Jun 29, 2021

we already allow floats so that’s equivalent to max_features=.33 and max_features=.9:

If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

Top Results From Across the Web

sklearn.ensemble.RandomForestRegressor

If None or 1.0, then max_features=n_features . Note. The default of 1.0 is equivalent to bagged trees and more randomness can be achieved...

Why is the default value for max_features in ... - Stack Overflow

In RandomForestClassifier the default value for max_features is sqrt(n_features) and in RandomForestRegressor it is n_features ...

How many features does (Random Forest) need for the trees?

In general, it's a good rule of thumb to use the default values: max_features=sqrt(n_features) for classification and ...

Random Forest Regression - Towards Data Science

The default value is MSE. max_depth — this sets the maximum possible depth of each tree; max_features — the maximum number of features...

Random Forest Parameter Tuning - Analytics Vidhya

Parameters / levers to tune Random Forests · 1.a. max_features: These are the maximum number of features Random Forest is allowed to try...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Reconsider default of `max_features` of RandomForestRegressor

What to change?

Why?

Issue Analytics

Top GitHub Comments

5-fold CV: MSE and uncertainty (std)

Fit times

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Follow-up of QuantileRegressor implementation

MLPClassifier drops accuracy when number of features is equal to number of nodes in the hidden layer