Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tweedie deviance loss for tree based models

See original GitHub issue

Describe the workflow you want to enable

If the target y is (approximately) Poisson, Gamma or else Tweedie distributed, it would be beneficial for tree based regressors to support Tweedie deviance loss functions as splitting criterion. This partially addresses #5975.

Describe your proposed solution

Ideally, one first implements

differentiable loss functions #15123

and then adds the different loss criteria to the tree based models:

DecisionTreeRegressor
RandomForestRegressor
GradientBoostingRegressor
HistGradientBoostingRegressor.

Open for Discussion

For Poisson and Tweedie deviance with 1<=power<2, ther target y may be zero while the prediction y_pred must be strictly larger than zero. A tree might find a split where one node has y=0 for all samples in that node, resulting naively in y_pred = mean(y) = 0 for that node. I see 3 different solutions to that:

Use a log-link function, i.e. predict y_pred = np.exp(tree) See #16692 for HistGradientBoostingRegressor. This may be no option for DecisionTreeRegressor.
Use a splitting rule that forbids splits where one node has sum(y)=0. One might also introduce some option like min_y_weight, such that splits with sum(sample_weight*y) < min_y_weight are forbidden.
Use some form of parent child average y_pred = a * mean(y) + (1-a) * y_pred_parent and forbid further splits, see [1]. (Bayes/credibility theory motivates to set a = sum(sample_weight*y)/(gamma+sum(sample_weight*y)) for some hyperparameter gamma.)

There is also a dirty solution that allows y_pred=0 but sets the value min(eps, y_pred) in the loss function for some tiny value of eps.

References

[1] R rpart library, chapter 8 Poisson regression

Issue Analytics

State:
Created 4 years ago
Reactions:2
Comments:12 (12 by maintainers)

Top GitHub Comments

1reaction

NicolasHugcommented, Mar 13, 2020

just for Poisson and Gamma or for the whole Tweedie distributions

Let’s keep things simple for now and not introduce the power parameter (we don’t have that kind of flexible loss API yet)

If you’d like, I can give it a shot.

Yes please!!

0reactions

Reksbrilcommented, May 10, 2020

@ogrisel @lorentzenchr can I work on the other cases, or someone is already doing so?

Read more comments on GitHub >

Top Results From Across the Web

autocalibration and tweedie-dominance for insurance pricing ...

However, it turned out that tree-based boosting models and Neural Networks trained to minimize deviance often violate total balance, ...

Tweedie distribution - Wikipedia

Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.

Can Poisson deviance be used to evaluate models that use ...

Especially of interest, are tree-based approaches to this analysis - namely random forest and GBMs. I've built some models using Sklearn ...

Tweedie deviance loss function in lightGBM - Stack Overflow

I am using the Tweedie objective function with lightGBM and have some questions: ... I looked in the source code and it seems...

A Unified Approach to Sparse Tweedie Modeling ... - NSF PAR

to be used together for claim loss prediction. The second type uses Tweedie's compound Poisson model (or Tweedie model for short; Tweedie 1984)...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

Clearer definition of alpha in KRR

PolynomialFeatures' docstring does not mention that sparse data is allowed for fit