Formula for dual gap of elastic net in coordinate descent solver
See original GitHub issueDescribe the bug
The computation of the dual gap for the elastic net in the coordinate descent solver (enet_coordinate_descent
) might be wrong.
The elastic net minimizes
Primal(w) = (1/2) * ||y - X w||_2^2 + alpha * ||w||_1 + beta/2 * ||w||_2^2
Lasso
For the pure Lasso, i.e. beta=0
, the dual becomes [1]
Dual(nu) = -1/2 * ||nu||_2^2 - nu'y if ||X'nu||_∞ = max(abs(X'nu)) <= alpha else -∞
with nu = Xw - y
(=minus the residual R
) as possible dual point. This yields the Lasso dual gap
Primal(w) - Dual(nu)
In case of ||X'nu||_∞ > alpha
, one uses a down-scaled variable nu
in the dual.
Elastic Net
For the elastic net, the dual, see section 5.2.3 in [2], becomes
Dual(nu) = -1/2 * ||nu||_2^2 - nu'y - 1/(2*beta) * sum_j (|X_j'nu| - alpha)_+^2
with X_j
the j-th column of X
and x_+ = max(0, x)
the positive part.
References
[1] Kim, Seung-Jean et al. “An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares.” IEEE Journal of Selected Topics in Signal Processing 1 (2007): 606-617. pdf link [2] Mendler-Dünner, Celestine et al. “Primal-Dual Rates and Certificates.” ICML (2016). arxiv link
Expected Results
if beta > =0:
R = y - X @ w # note: R = -nu
R_norm2 = R @ R
gap = R_norm2 - R @ y + alpha * l1_norm + beta/2 * l2_norm
gap += 1/(2*beta) * np.sum(np.max(0, np.abs(X.T @ R) - alpha)**2)
At least a test for it, similar to the pure Lasso case in https://github.com/scikit-learn/scikit-learn/blob/2c2f31d68c21b7647557b2f776e86c05954d80bf/sklearn/linear_model/tests/test_coordinate_descent.py#L292
Actual Results
I start to be rattled by the -beta * w
term in line 205.
https://github.com/scikit-learn/scikit-learn/blob/2c2f31d68c21b7647557b2f776e86c05954d80bf/sklearn/linear_model/_cd_fast.pyx#L205-L236
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (10 by maintainers)
Top GitHub Comments
I think it’s based on the trick where you write the Elastic net as a Lasso, with an extended design X (shape (n_samples + n_features, n_features). The Enet objective (without the n_samples dividing the datafit) is a Lasso with following design matrices and observation vector:
and so
X_tilde.T @ (X_tilde @ w - y_tilde) = X_tilde.T @ np.vstack([X @ w - y, np.sqrt(beta) * w]) = X.T @ (X @ w - y) + beta * w
Not sure why it should raise a warning. Coordinate descent works fine on this problem (it’s strongly convex). Also, the issue raises a huge number of warnings in the
ElasticNet.path
. For a single value, sureRidgeRegression
is fine.I guess my point is that the solver is not necessarily failing, it is the convergence check that is failing. There are other valid dual problems that would work fine, it’s just this trick of using a dual from writing an ENet with
l1_ratio>0
fails here because the dual function is infinite unless X’r==0.