Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Remove link argument from TweedieRegressor

See original GitHub issue

Proposal

Remove the argument link from TweedieRegressor and always use a log-link.

Rationale

While PoissonRegressor and GammaRegressor have built-in log-link (prediction = exp(…)), the TweedieRegressor has an argument link : {'auto', 'identity', 'log'}, default='auto'.

The problem is that Tweedie should always predict positive vallues, but with link=identity`’ this goes wrong, for example

import numpy as np
from sklearn.linear_model import TweedieRegressor


y = np.array([3, 2, 1, 0, 0])
X = np.arange(5)[:, None]
reg = TweedieRegressor(power=1, link='identity')  # power=1 optimizes Poisson deviance
reg.fit(X, y)
reg.predict(X)

Out[7]: array([ 1.95131702,  0.45263687, -1.04604328, -2.54472343, -4.04340359])

The only instance, where negatives predictions are ok, is power=0 which gives squared error. But if someone want’t to fit squared error, she or he should better use Ridge and the likes.

Issue Analytics

State:
Created 3 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

lorentzenchrcommented, Jan 29, 2021

It means that this issue is open for discussion and needs a decision by the core devs. A PR/implementation only makes sense after such a decision.

0reactions

jpbrodskycommented, May 11, 2022

My use case for an identity link: (Which I also mentioned at https://github.com/scikit-learn/scikit-learn/pull/22548#discussion_r869765737)

I have a problem where I know the expectation y’ follows the linear model y’ = w x. My measurements, y, have poisson errors.

(The specific problem involves analysis of radiation measurements. The expectation is linear with the amount of source; the measurements are poisson distributed).

Using a log link function is just not the right description of my problem. Yes, the whole thing breaks down when evaluating negative values of w, but it seems much better to offer a constraint to avoid ever evaluating negative values of w rather than exclude the situations where you have an actual linear relationship with poisson measurements.

Top Results From Across the Web

sklearn.linear_model.TweedieRegressor

Generalized Linear Model with a Tweedie distribution. This estimator can be used to model different GLMs depending on the power parameter, which determines...

What is the difference between sklearn's LinearRegression ...

Sklearn documentation states the TweedieRegressor is a generalised estimation model with a Tweedie distribution. Tweedie distributions are a ...

QuantCo Inc. - glum documentation

glum is a fast, modern, Python-first GLM estimation library. Generalized linear modeling (GLM) is a core statistical.

sklearn StackingClassifier and sample weights - Stack Overflow

Remove the None from the method as well. self.stack_method_ ... import StandardScaler from sklearn.linear_model import TweedieRegressor from ...

The GENMOD Procedure - SAS Support

Example 42.5: GEE for Binary Data with Logit Link Function . ... the list are truncated to integers and repeated values are removed....