Tweedie deviance loss for tree based models
See original GitHub issueDescribe the workflow you want to enable
If the target y is (approximately) Poisson, Gamma or else Tweedie distributed, it would be beneficial for tree based regressors to support Tweedie deviance loss functions as splitting criterion. This partially addresses #5975.
Describe your proposed solution
Ideally, one first implements
- differentiable loss functions #15123
and then adds the different loss criteria to the tree based models:
-
DecisionTreeRegressor -
RandomForestRegressor -
GradientBoostingRegressor -
HistGradientBoostingRegressor.
Open for Discussion
For Poisson and Tweedie deviance with 1<=power<2, ther target y may be zero while the prediction y_pred must be strictly larger than zero. A tree might find a split where one node has y=0 for all samples in that node, resulting naively in y_pred = mean(y) = 0 for that node. I see 3 different solutions to that:
- Use a log-link function, i.e. predict
y_pred = np.exp(tree)See #16692 for HistGradientBoostingRegressor. This may be no option for DecisionTreeRegressor. - Use a splitting rule that forbids splits where one node has
sum(y)=0. One might also introduce some option likemin_y_weight, such that splits withsum(sample_weight*y) < min_y_weightare forbidden. - Use some form of parent child average
y_pred = a * mean(y) + (1-a) * y_pred_parentand forbid further splits, see [1]. (Bayes/credibility theory motivates to seta = sum(sample_weight*y)/(gamma+sum(sample_weight*y))for some hyperparametergamma.)
There is also a dirty solution that allows y_pred=0 but sets the value min(eps, y_pred) in the loss function for some tiny value of eps.
References
[1] R rpart library, chapter 8 Poisson regression
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:12 (12 by maintainers)

Top Related StackOverflow Question
Let’s keep things simple for now and not introduce the power parameter (we don’t have that kind of flexible loss API yet)
Yes please!!
@ogrisel @lorentzenchr can I work on the other cases, or someone is already doing so?