question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Formula for dual gap of elastic net in coordinate descent solver

See original GitHub issue

Describe the bug

The computation of the dual gap for the elastic net in the coordinate descent solver (enet_coordinate_descent) might be wrong.

The elastic net minimizes

Primal(w) = (1/2) * ||y - X w||_2^2 + alpha * ||w||_1 + beta/2 * ||w||_2^2

Lasso

For the pure Lasso, i.e. beta=0, the dual becomes [1]

Dual(nu) = -1/2 * ||nu||_2^2 - nu'y   if ||X'nu||_∞ = max(abs(X'nu)) <= alpha   else -∞

with nu = Xw - y (=minus the residual R) as possible dual point. This yields the Lasso dual gap

Primal(w) - Dual(nu)

In case of ||X'nu||_∞ > alpha, one uses a down-scaled variable nu in the dual.

Elastic Net

For the elastic net, the dual, see section 5.2.3 in [2], becomes

Dual(nu) = -1/2 * ||nu||_2^2 - nu'y - 1/(2*beta) * sum_j (|X_j'nu| - alpha)_+^2

with X_j the j-th column of X and x_+ = max(0, x) the positive part.

References

[1] Kim, Seung-Jean et al. “An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares.” IEEE Journal of Selected Topics in Signal Processing 1 (2007): 606-617. pdf link [2] Mendler-Dünner, Celestine et al. “Primal-Dual Rates and Certificates.” ICML (2016). arxiv link

Expected Results

if beta > =0:
    R = y - X @ w  # note: R = -nu
    R_norm2 = R @ R
    gap = R_norm2 - R @ y + alpha * l1_norm + beta/2 * l2_norm
    gap += 1/(2*beta) * np.sum(np.max(0, np.abs(X.T @ R) - alpha)**2)

At least a test for it, similar to the pure Lasso case in https://github.com/scikit-learn/scikit-learn/blob/2c2f31d68c21b7647557b2f776e86c05954d80bf/sklearn/linear_model/tests/test_coordinate_descent.py#L292

Actual Results

I start to be rattled by the -beta * w term in line 205. https://github.com/scikit-learn/scikit-learn/blob/2c2f31d68c21b7647557b2f776e86c05954d80bf/sklearn/linear_model/_cd_fast.pyx#L205-L236

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:13 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
mathurinmcommented, Mar 15, 2022

I think it’s based on the trick where you write the Elastic net as a Lasso, with an extended design X (shape (n_samples + n_features, n_features). The Enet objective (without the n_samples dividing the datafit) is a Lasso with following design matrices and observation vector:

X_tilde = np.vstack([X, np.sqrt(beta) * np.eye(n_features)])
y_tilde = np.hstack([y, np.zeros(n_features)])

and so X_tilde.T @ (X_tilde @ w - y_tilde) = X_tilde.T @ np.vstack([X @ w - y, np.sqrt(beta) * w]) = X.T @ (X @ w - y) + beta * w

0reactions
jonathan-taylorcommented, Aug 3, 2022

Not sure why it should raise a warning. Coordinate descent works fine on this problem (it’s strongly convex). Also, the issue raises a huge number of warnings in the ElasticNet.path. For a single value, sure RidgeRegression is fine.

I guess my point is that the solver is not necessarily failing, it is the convergence check that is failing. There are other valid dual problems that would work fine, it’s just this trick of using a dual from writing an ENet with l1_ratio>0 fails here because the dual function is infinite unless X’r==0.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Mind the duality gap: safer rules for the Lasso - arXiv
Application of our GAP SAFE rules with a coordinate descent solver for the Lasso problem is proposed in Section 4. Using standard data-sets,...
Read more >
Debiasing the Elastic Net for models ... - Archive ouverte HAL
Our aim is to jointly estimate the support and the associated de-biased coefficients, starting from an Elastic Net type estimator. The main idea ......
Read more >
sklearn.linear_model.ElasticNet
Number of iterations run by the coordinate descent solver to reach the specified tolerance. dual_gap_float or ndarray of shape (n_targets,). Given param alpha, ......
Read more >
Lasso regression implementation analysis
If dual gap has not converged, although regression weights almost stopped decreasing, it emits a warning. Coordinate descent. At each step of ...
Read more >
Scikit Learn - Elastic-Net - Tutorialspoint
Scikit Learn - Elastic-Net, The Elastic-Net is a regularised regression method ... It gives the number of iterations run by the coordinate descent...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found