Remove link argument from TweedieRegressor
See original GitHub issueProposal
Remove the argument link
from TweedieRegressor
and always use a log-link.
Rationale
While PoissonRegressor
and GammaRegressor
have built-in log-link (prediction = exp(…)), the TweedieRegressor
has an argument link : {'auto', 'identity', 'log'}, default='auto'
.
The problem is that Tweedie should always predict positive vallues, but with link=
identity`’ this goes wrong, for example
import numpy as np
from sklearn.linear_model import TweedieRegressor
y = np.array([3, 2, 1, 0, 0])
X = np.arange(5)[:, None]
reg = TweedieRegressor(power=1, link='identity') # power=1 optimizes Poisson deviance
reg.fit(X, y)
reg.predict(X)
Out[7]: array([ 1.95131702, 0.45263687, -1.04604328, -2.54472343, -4.04340359])
The only instance, where negatives predictions are ok, is power=0
which gives squared error. But if someone want’t to fit squared error, she or he should better use Ridge
and the likes.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
sklearn.linear_model.TweedieRegressor
Generalized Linear Model with a Tweedie distribution. This estimator can be used to model different GLMs depending on the power parameter, which determines...
Read more >What is the difference between sklearn's LinearRegression ...
Sklearn documentation states the TweedieRegressor is a generalised estimation model with a Tweedie distribution. Tweedie distributions are a ...
Read more >QuantCo Inc. - glum documentation
glum is a fast, modern, Python-first GLM estimation library. Generalized linear modeling (GLM) is a core statistical.
Read more >sklearn StackingClassifier and sample weights - Stack Overflow
Remove the None from the method as well. self.stack_method_ ... import StandardScaler from sklearn.linear_model import TweedieRegressor from ...
Read more >The GENMOD Procedure - SAS Support
Example 42.5: GEE for Binary Data with Logit Link Function . ... the list are truncated to integers and repeated values are removed....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
It means that this issue is open for discussion and needs a decision by the core devs. A PR/implementation only makes sense after such a decision.
My use case for an identity link: (Which I also mentioned at https://github.com/scikit-learn/scikit-learn/pull/22548#discussion_r869765737)
I have a problem where I know the expectation y’ follows the linear model y’ = w x. My measurements, y, have poisson errors.
(The specific problem involves analysis of radiation measurements. The expectation is linear with the amount of source; the measurements are poisson distributed).
Using a log link function is just not the right description of my problem. Yes, the whole thing breaks down when evaluating negative values of w, but it seems much better to offer a constraint to avoid ever evaluating negative values of w rather than exclude the situations where you have an actual linear relationship with poisson measurements.