question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tweedie deviance loss for tree based models

See original GitHub issue

Describe the workflow you want to enable

If the target y is (approximately) Poisson, Gamma or else Tweedie distributed, it would be beneficial for tree based regressors to support Tweedie deviance loss functions as splitting criterion. This partially addresses #5975.

Describe your proposed solution

Ideally, one first implements

  • differentiable loss functions #15123

and then adds the different loss criteria to the tree based models:

  • DecisionTreeRegressor
  • RandomForestRegressor
  • GradientBoostingRegressor
  • HistGradientBoostingRegressor.

Open for Discussion

For Poisson and Tweedie deviance with 1<=power<2, ther target y may be zero while the prediction y_pred must be strictly larger than zero. A tree might find a split where one node has y=0 for all samples in that node, resulting naively in y_pred = mean(y) = 0 for that node. I see 3 different solutions to that:

  1. Use a log-link function, i.e. predict y_pred = np.exp(tree) See #16692 for HistGradientBoostingRegressor. This may be no option for DecisionTreeRegressor.
  2. Use a splitting rule that forbids splits where one node has sum(y)=0. One might also introduce some option like min_y_weight, such that splits with sum(sample_weight*y) < min_y_weight are forbidden.
  3. Use some form of parent child average y_pred = a * mean(y) + (1-a) * y_pred_parent and forbid further splits, see [1]. (Bayes/credibility theory motivates to set a = sum(sample_weight*y)/(gamma+sum(sample_weight*y)) for some hyperparameter gamma.)

There is also a dirty solution that allows y_pred=0 but sets the value min(eps, y_pred) in the loss function for some tiny value of eps.

References

[1] R rpart library, chapter 8 Poisson regression

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:2
  • Comments:12 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
NicolasHugcommented, Mar 13, 2020

just for Poisson and Gamma or for the whole Tweedie distributions

Let’s keep things simple for now and not introduce the power parameter (we don’t have that kind of flexible loss API yet)

If you’d like, I can give it a shot.

Yes please!!

0reactions
Reksbrilcommented, May 10, 2020

@ogrisel @lorentzenchr can I work on the other cases, or someone is already doing so?

Read more comments on GitHub >

github_iconTop Results From Across the Web

autocalibration and tweedie-dominance for insurance pricing ...
However, it turned out that tree-based boosting models and Neural Networks trained to minimize deviance often violate total balance, ...
Read more >
Tweedie distribution - Wikipedia
Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.
Read more >
Can Poisson deviance be used to evaluate models that use ...
Especially of interest, are tree-based approaches to this analysis - namely random forest and GBMs. I've built some models using Sklearn ...
Read more >
Tweedie deviance loss function in lightGBM - Stack Overflow
I am using the Tweedie objective function with lightGBM and have some questions: ... I looked in the source code and it seems...
Read more >
A Unified Approach to Sparse Tweedie Modeling ... - NSF PAR
to be used together for claim loss prediction. The second type uses Tweedie's compound Poisson model (or Tweedie model for short; Tweedie 1984)...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found