question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Don't allow criterion='mae' for gradient boosting estimators

See original GitHub issue

The MAE criterion for trees was introduced in https://github.com/scikit-learn/scikit-learn/pull/6667. This PR also started exposing the criterion parameter to GradientBoostingClassifier and GradientBoostingRegressor, thus allowing ‘mae’, ‘mse’, and ‘friedman_mse’. Before that, the GBDTs were hardcoded to use ‘friedman_mse’.

I think we should stop allowing criterion='mae' for GBDTs.

My understanding of Gradient Boosting is that the trees should be predicting gradients using a least squares criterion. If we want to minimize the absolute error, we should be using loss='lad', but the criterion used for splitting the tree nodes should still be a least-squares (‘mse’ or ‘friedman_mse’). I think that splitting the gradients using mae isn’t methodologically correct.

In his original paper, Friedman does mention the possibility to fit a tree to the residuals using an lad criterion. But never does he suggest that one could fit the trees to the gradients using lad, which is what we are currently allowing.

I ran some benchmarks on the PMLB dataset (most datasets are balanced hence accuracy is a decent measure).

image

We can see that using criterion=mae usually perfoms worse than using mse or friedman_mse, even when loss=lad. Also, criterion=mae is 60 times slower than the other criteria (see notebook for details).

Note: From the benchmarks, friedman_mse does seem to (marginally) outperform mse, so I guess keeping it as the default makes sense. CC @thomasjpfan @lorentzenchr

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:3
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
NicolasHugcommented, Sep 1, 2020

@nikhilreddybilla28, this issue was already claimed by @madhuracj

1reaction
nikhilreddybilla28commented, Sep 1, 2020

take

Read more comments on GitHub >

github_iconTop Results From Across the Web

Gradient Boosting Classification explained through Python
Criterion : The loss function used to find the optimal feature and threshold to split the data. learning_rate: this parameter scales the contribution...
Read more >
Gradient Boosting | Hyperparameter Tuning Python
A guide to gradient boosting and hyperparameter tuning in gradient boosting algorithm using Python to adjust bias variance trade-off in ...
Read more >
Gradient boosting machines, a tutorial - Frontiers
Gradient boosting machines are a family of powerful machine-learning techniques that have shown considerable success in a wide range of ...
Read more >
sklearn.ensemble.GradientBoostingClassifier
Gradient Boosting for classification. This algorithm builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary ...
Read more >
(Stochastic) Gradient Descent, Gradient Boosting
If you pick the right step size, you can make very quick progress. If you pick too a big step size, you're going...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found