question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Trees with MAE criterion are slow to train

See original GitHub issue

Description

when I use ‘mae’ criterion for the model extratreesregressor, training for a long time, it’s seems lead to an endless training. there have no problem for mse I find not only me hava this problem. I hava tried two version (0.18 and 0.19.X) , but no used .

https://www.kaggle.com/c/allstate-claims-severity/discussion/24293

Steps/Code to Reproduce

from sklearn.ensemble import ExtraTreesRegressor
rfr = ExtraTreesRegressor(n_estimators=100,
max_features=0.8,  
criterion='mae', 
max_depth=6, 
min_samples_leaf=200,
 n_jobs=-1,
 random_state=17, 
verbose=0)
 mod = rfr.fit(train[distilled_features], train['y'])

Expected Results

can normal training in my model when mae used

Actual Results

aways traing in fit step

Versions

Darwin-16.1.0-x86_64-i386-64bit Python 3.6.1 |Anaconda 4.4.0 (x86_64)| (default, May 11 2017, 13:04:09) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] NumPy 1.12.1 SciPy 0.19.0 Scikit-Learn 0.19.X

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:25
  • Comments:54 (33 by maintainers)

github_iconTop GitHub Comments

15reactions
kasuterucommented, May 13, 2019

I noticed this is currently still an issue - my training with “mae” as criterion does not finish (Grid Search with GradientBoosting Regression Trees. I spent a lot of time trying to debug what was wrong before stumbling on this thread. This is why I would propose adding a warning in the documentation (e.g. “training with ‘mae’ as criterion means recalculating the loss function after each iteration and can take an extremely long time”) to prevent future coders getting stuck at the same problem.

1reaction
mcgibboncommented, Jul 6, 2021

That commit is written by past Jeremy who was much better informed on the issue. It’s probably the best I can give you without diving into it again, which I haven’t had the time for. I can also say reflecting back that I didn’t succeed because I wasn’t able to set up a good CI cycle for the WIP code (wasn’t as familiar with Cython and testing as I am now), and because I got segmentation faults. I also suspect my code logic around the update was flawed, this could be avoided with some good test cases when implementing the update.

I would suggest starting over from scratch instead of from my very outdated branch, and using the commit as a reference of where you might need to add code moreso than what code you need to add (the descriptions in this issue are better for that).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why is training a random forest regressor with MAE criterion ...
I want to optimize MAE for larger applications, but find the speed of the RandomForestRegressor tuned to this criterion prohibitively slow.
Read more >
Why is training a random forest regressor with MAE criterion so slow ...
Why does using the 'mae' criterion take so long for training a RandomForestRegressor? I want to optimize MAE for larger applications, but find...
Read more >
Decision Tree taking too long to execute
I am training a Decision Tree Regressor on a relatively small data. ... criterion = 'mae') dtr_pipe = Pipeline(steps = [('preproc', ...
Read more >
Scikit Learn - Decision Trees - Tutorialspoint
The default is false but of set to true, it may slow down the training process. Attributes. Following table consist the attributes used...
Read more >
The Ultimate Guide to Random Forest Regression - Keboola
Each tree needs to predict the expected price of the real estate based on the decision criteria it picked. Random forest regression then ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found