Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Formula for dual gap of elastic net in coordinate descent solver

See original GitHub issue

Describe the bug

The computation of the dual gap for the elastic net in the coordinate descent solver (enet_coordinate_descent) might be wrong.

The elastic net minimizes

Primal(w) = (1/2) * ||y - X w||_2^2 + alpha * ||w||_1 + beta/2 * ||w||_2^2

Lasso

For the pure Lasso, i.e. beta=0, the dual becomes [1]

Dual(nu) = -1/2 * ||nu||_2^2 - nu'y   if ||X'nu||_∞ = max(abs(X'nu)) <= alpha   else -∞

with nu = Xw - y (=minus the residual R) as possible dual point. This yields the Lasso dual gap

Primal(w) - Dual(nu)

In case of ||X'nu||_∞ > alpha, one uses a down-scaled variable nu in the dual.

Elastic Net

For the elastic net, the dual, see section 5.2.3 in [2], becomes

Dual(nu) = -1/2 * ||nu||_2^2 - nu'y - 1/(2*beta) * sum_j (|X_j'nu| - alpha)_+^2

with X_j the j-th column of X and x_+ = max(0, x) the positive part.

References

[1] Kim, Seung-Jean et al. “An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares.” IEEE Journal of Selected Topics in Signal Processing 1 (2007): 606-617. pdf link [2] Mendler-Dünner, Celestine et al. “Primal-Dual Rates and Certificates.” ICML (2016). arxiv link

Expected Results

if beta > =0:
    R = y - X @ w  # note: R = -nu
    R_norm2 = R @ R
    gap = R_norm2 - R @ y + alpha * l1_norm + beta/2 * l2_norm
    gap += 1/(2*beta) * np.sum(np.max(0, np.abs(X.T @ R) - alpha)**2)

At least a test for it, similar to the pure Lasso case in https://github.com/scikit-learn/scikit-learn/blob/2c2f31d68c21b7647557b2f776e86c05954d80bf/sklearn/linear_model/tests/test_coordinate_descent.py#L292

Actual Results

I start to be rattled by the -beta * w term in line 205. https://github.com/scikit-learn/scikit-learn/blob/2c2f31d68c21b7647557b2f776e86c05954d80bf/sklearn/linear_model/_cd_fast.pyx#L205-L236

Issue Analytics

State:
Created 2 years ago
Comments:13 (10 by maintainers)

Top GitHub Comments

1reaction

mathurinmcommented, Mar 15, 2022

I think it’s based on the trick where you write the Elastic net as a Lasso, with an extended design X (shape (n_samples + n_features, n_features). The Enet objective (without the n_samples dividing the datafit) is a Lasso with following design matrices and observation vector:

X_tilde = np.vstack([X, np.sqrt(beta) * np.eye(n_features)])
y_tilde = np.hstack([y, np.zeros(n_features)])

and so X_tilde.T @ (X_tilde @ w - y_tilde) = X_tilde.T @ np.vstack([X @ w - y, np.sqrt(beta) * w]) = X.T @ (X @ w - y) + beta * w

0reactions

jonathan-taylorcommented, Aug 3, 2022

Not sure why it should raise a warning. Coordinate descent works fine on this problem (it’s strongly convex). Also, the issue raises a huge number of warnings in the ElasticNet.path. For a single value, sure RidgeRegression is fine.

I guess my point is that the solver is not necessarily failing, it is the convergence check that is failing. There are other valid dual problems that would work fine, it’s just this trick of using a dual from writing an ENet with l1_ratio>0 fails here because the dual function is infinite unless X’r==0.

Top Results From Across the Web

Mind the duality gap: safer rules for the Lasso - arXiv

Application of our GAP SAFE rules with a coordinate descent solver for the Lasso problem is proposed in Section 4. Using standard data-sets,...

Debiasing the Elastic Net for models ... - Archive ouverte HAL

Our aim is to jointly estimate the support and the associated de-biased coefficients, starting from an Elastic Net type estimator. The main idea ......

sklearn.linear_model.ElasticNet

Number of iterations run by the coordinate descent solver to reach the specified tolerance. dual_gap_float or ndarray of shape (n_targets,). Given param alpha, ......

Lasso regression implementation analysis

If dual gap has not converged, although regression weights almost stopped decreasing, it emits a warning. Coordinate descent. At each step of ...

Scikit Learn - Elastic-Net - Tutorialspoint

Scikit Learn - Elastic-Net, The Elastic-Net is a regularised regression method ... It gives the number of iterations run by the coordinate descent...