Trees with MAE criterion are slow to train
See original GitHub issueDescription
when I use ‘mae’ criterion for the model extratreesregressor, training for a long time, it’s seems lead to an endless training. there have no problem for mse I find not only me hava this problem. I hava tried two version (0.18 and 0.19.X) , but no used .
https://www.kaggle.com/c/allstate-claims-severity/discussion/24293
Steps/Code to Reproduce
from sklearn.ensemble import ExtraTreesRegressor
rfr = ExtraTreesRegressor(n_estimators=100,
max_features=0.8,
criterion='mae',
max_depth=6,
min_samples_leaf=200,
n_jobs=-1,
random_state=17,
verbose=0)
mod = rfr.fit(train[distilled_features], train['y'])
Expected Results
can normal training in my model when mae used
Actual Results
aways traing in fit step
Versions
Darwin-16.1.0-x86_64-i386-64bit Python 3.6.1 |Anaconda 4.4.0 (x86_64)| (default, May 11 2017, 13:04:09) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] NumPy 1.12.1 SciPy 0.19.0 Scikit-Learn 0.19.X
Issue Analytics
- State:
- Created 6 years ago
- Reactions:25
- Comments:54 (33 by maintainers)
Top Results From Across the Web
Why is training a random forest regressor with MAE criterion ...
I want to optimize MAE for larger applications, but find the speed of the RandomForestRegressor tuned to this criterion prohibitively slow.
Read more >Why is training a random forest regressor with MAE criterion so slow ...
Why does using the 'mae' criterion take so long for training a RandomForestRegressor? I want to optimize MAE for larger applications, but find...
Read more >Decision Tree taking too long to execute
I am training a Decision Tree Regressor on a relatively small data. ... criterion = 'mae') dtr_pipe = Pipeline(steps = [('preproc', ...
Read more >Scikit Learn - Decision Trees - Tutorialspoint
The default is false but of set to true, it may slow down the training process. Attributes. Following table consist the attributes used...
Read more >The Ultimate Guide to Random Forest Regression - Keboola
Each tree needs to predict the expected price of the real estate based on the decision criteria it picked. Random forest regression then ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I noticed this is currently still an issue - my training with “mae” as criterion does not finish (Grid Search with GradientBoosting Regression Trees. I spent a lot of time trying to debug what was wrong before stumbling on this thread. This is why I would propose adding a warning in the documentation (e.g. “training with ‘mae’ as criterion means recalculating the loss function after each iteration and can take an extremely long time”) to prevent future coders getting stuck at the same problem.
That commit is written by past Jeremy who was much better informed on the issue. It’s probably the best I can give you without diving into it again, which I haven’t had the time for. I can also say reflecting back that I didn’t succeed because I wasn’t able to set up a good CI cycle for the WIP code (wasn’t as familiar with Cython and testing as I am now), and because I got segmentation faults. I also suspect my code logic around the update was flawed, this could be avoided with some good test cases when implementing the update.
I would suggest starting over from scratch instead of from my very outdated branch, and using the commit as a reference of where you might need to add code moreso than what code you need to add (the descriptions in this issue are better for that).