Tweedie deviance loss for tree based models
See original GitHub issueDescribe the workflow you want to enable
If the target y
is (approximately) Poisson, Gamma or else Tweedie distributed, it would be beneficial for tree based regressors to support Tweedie deviance loss functions as splitting criterion. This partially addresses #5975.
Describe your proposed solution
Ideally, one first implements
- differentiable loss functions #15123
and then adds the different loss criteria to the tree based models:
-
DecisionTreeRegressor
-
RandomForestRegressor
-
GradientBoostingRegressor
-
HistGradientBoostingRegressor
.
Open for Discussion
For Poisson and Tweedie deviance with 1<=power<2
, ther target y
may be zero while the prediction y_pred
must be strictly larger than zero. A tree might find a split where one node has y=0
for all samples in that node, resulting naively in y_pred = mean(y) = 0
for that node. I see 3 different solutions to that:
- Use a log-link function, i.e. predict
y_pred = np.exp(tree)
See #16692 for HistGradientBoostingRegressor. This may be no option for DecisionTreeRegressor. - Use a splitting rule that forbids splits where one node has
sum(y)=0
. One might also introduce some option likemin_y_weight
, such that splits withsum(sample_weight*y) < min_y_weight
are forbidden. - Use some form of parent child average
y_pred = a * mean(y) + (1-a) * y_pred_parent
and forbid further splits, see [1]. (Bayes/credibility theory motivates to seta = sum(sample_weight*y)/(gamma+sum(sample_weight*y))
for some hyperparametergamma
.)
There is also a dirty solution that allows y_pred=0
but sets the value min(eps, y_pred)
in the loss function for some tiny value of eps
.
References
[1] R rpart library, chapter 8 Poisson regression
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:12 (12 by maintainers)
Top GitHub Comments
Let’s keep things simple for now and not introduce the power parameter (we don’t have that kind of flexible loss API yet)
Yes please!!
@ogrisel @lorentzenchr can I work on the other cases, or someone is already doing so?