Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug in LASSO AIC BIC formula

See original GitHub issue

Hi, I think that the calculation of sigma in this line is wrong:

https://github.com/scikit-learn/scikit-learn/blob/95d4f0841d57e8b5f6b2a570312e9d832e69debc/sklearn/linear_model/_least_angle.py#L1785

According to Zou et al, 2007 Eq. 2.12 the value of sigma is the sigma_ols. It should be calculated as:

sigma2 = np.var(R)

I hope this is useful.

Issue Analytics

State:
Created 3 years ago
Comments:9 (7 by maintainers)

Top GitHub Comments

1reaction

glemaitrecommented, Oct 27, 2021

So looking a bit more at the literature I could find the following:

https://www.sciencedirect.com/science/article/abs/pii/S0893965917301623

In short, it confirms the section “Compare with least squares” from the Wikipedia page:

https://en.wikipedia.org/wiki/Akaike_information_criterion

We can use a surrogate of the AIC by discarding the constant term and thus compute:

delta_AIC = n_samples * np.log(mean_squared_error) + K * degree_of_freedom

0reactions

glemaitrecommented, Oct 29, 2021

After some more reading and math sketching, AIC and Cp are not equals. In addition, it could be more confusing because Cp can be defined differently (cf. https://en.wikipedia.org/wiki/Mallows's_Cp#cite_note-4) The relationship between the Cp and AIC as defined in ESL is shown here: https://stats.stackexchange.com/a/492385/121348

So we can just better document which definition of AIC was are using. In addition, we have a bug regarding the estimator of the noise variance of the OLS. If we want to have an unbiased estimator, one possible choice is RSS / (n_samples - n_features). However, it does seem to be well defined for n_features > n_samples that is problematic. We probably need to look a bit more there.

In addition, for the OLS estimator, @ogrisel was proposing to fit a ridge with a very low penalty to get a stable result even with colinearity. In this case, we control the OLS model and ensure that we always have the same penalty applied which is not the case with the last linear predictor in the lars path right now.

Top Results From Across the Web

Is it possible to calculate AIC and BIC for lasso regression ...

I'm using R to fit lasso regression models with the glmnet() function from the glmnet package, and I'd like to know how to...

Information criteria - MATLAB aicbic - MathWorks

To assess model adequacy, aicbic computes information criteria given loglikelihood values obtained by fitting competing models to data.

AIC and BIC: Comparisons of Assumptions and Performance

Abstract. The two most commonly used penalized model selection criteria, the Bayesian information criterion (BIC) and Akaike's information criterion (AIC), are ...

Lasso model selection: AIC-BIC / cross-validation - Scikit-learn

Indeed, several strategies can be used to select the value of the regularization parameter: via cross-validation or using an information criterion, namely AIC...

regsem: Regularized Structural Equation Modeling

start = 0, alpha = 0.5, gamma = 3.7, type = "lasso", random.alpha = 0.5,. Page 3. cv_regsem. 3 fit.ret = c("rmsea", "BIC",...