question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Remove link argument from TweedieRegressor

See original GitHub issue

Proposal

Remove the argument link from TweedieRegressor and always use a log-link.

Rationale

While PoissonRegressor and GammaRegressor have built-in log-link (prediction = exp(…)), the TweedieRegressor has an argument link : {'auto', 'identity', 'log'}, default='auto'.

The problem is that Tweedie should always predict positive vallues, but with link=identity`’ this goes wrong, for example

import numpy as np
from sklearn.linear_model import TweedieRegressor


y = np.array([3, 2, 1, 0, 0])
X = np.arange(5)[:, None]
reg = TweedieRegressor(power=1, link='identity')  # power=1 optimizes Poisson deviance
reg.fit(X, y)
reg.predict(X)
Out[7]: array([ 1.95131702,  0.45263687, -1.04604328, -2.54472343, -4.04340359])

The only instance, where negatives predictions are ok, is power=0 which gives squared error. But if someone want’t to fit squared error, she or he should better use Ridge and the likes.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
lorentzenchrcommented, Jan 29, 2021

It means that this issue is open for discussion and needs a decision by the core devs. A PR/implementation only makes sense after such a decision.

0reactions
jpbrodskycommented, May 11, 2022

My use case for an identity link: (Which I also mentioned at https://github.com/scikit-learn/scikit-learn/pull/22548#discussion_r869765737)

I have a problem where I know the expectation y’ follows the linear model y’ = w x. My measurements, y, have poisson errors.

(The specific problem involves analysis of radiation measurements. The expectation is linear with the amount of source; the measurements are poisson distributed).

Using a log link function is just not the right description of my problem. Yes, the whole thing breaks down when evaluating negative values of w, but it seems much better to offer a constraint to avoid ever evaluating negative values of w rather than exclude the situations where you have an actual linear relationship with poisson measurements.

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.linear_model.TweedieRegressor
Generalized Linear Model with a Tweedie distribution. This estimator can be used to model different GLMs depending on the power parameter, which determines...
Read more >
What is the difference between sklearn's LinearRegression ...
Sklearn documentation states the TweedieRegressor is a generalised estimation model with a Tweedie distribution. Tweedie distributions are a ...
Read more >
QuantCo Inc. - glum documentation
glum is a fast, modern, Python-first GLM estimation library. Generalized linear modeling (GLM) is a core statistical.
Read more >
sklearn StackingClassifier and sample weights - Stack Overflow
Remove the None from the method as well. self.stack_method_ ... import StandardScaler from sklearn.linear_model import TweedieRegressor from ...
Read more >
The GENMOD Procedure - SAS Support
Example 42.5: GEE for Binary Data with Logit Link Function . ... the list are truncated to integers and repeated values are removed....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found