least_angle.py in lars_path - shapes not aligned on properly formatted data with specific alpha
See original GitHub issueEncountered a similar error to https://github.com/scikit-learn/scikit-learn/issues/5873 - when running the following code:
a_selection = RandomizedLasso(alpha=0.025, normalize=False, n_jobs=1, random_state=42) a_selection.fit(X=x_sub, y=y_sub)
I can’t share the data as it’s from clinical trials, but what I have noticed is that the error disappears (for this particular fit) when I remove the alpha parameter. The code takes 2 days to complete so I am worried a different alpha will break a different dataset input. The data frame is properly formatted, the row number fits the labels and there are no NaNs. The error I get is:
~/miniconda3/lib/python3.6/site-packages/sklearn/linear_model/randomized_l1.py in fit(self, X, y) 110 n_jobs=self.n_jobs, verbose=self.verbose, 111 pre_dispatch=self.pre_dispatch, random_state=self.random_state, --> 112 sample_fraction=self.sample_fraction, **params) 113 114 if scores_.ndim == 1:
~/miniconda3/lib/python3.6/site-packages/sklearn/externals/joblib/memory.py in __call__(self, *args, **kwargs) 281 282 def __call__(self, *args, **kwargs): --> 283 return self.func(*args, **kwargs) 284 285 def call_and_shelve(self, *args, **kwargs):
~/miniconda3/lib/python3.6/site-packages/sklearn/linear_model/randomized_l1.py in _resample_model(estimator_func, X, y, scaling, n_resampling, n_jobs, verbose, pre_dispatch, random_state, sample_fraction, **params) 52 verbose=max(0, verbose - 1), 53 **params) ---> 54 for _ in range(n_resampling)): 55 scores_ += active_set 56
~/miniconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable) 756 # was dispatched. In particular this covers the edge 757 # case of Parallel used with an exhausted iterator. --> 758 while self.dispatch_one_batch(iterator): 759 self._iterating = True 760 else:
~/miniconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator) 606 return False 607 else: --> 608 self._dispatch(tasks) 609 return True 610
~/miniconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in _dispatch(self, batch) 569 dispatch_timestamp = time.time() 570 cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self) --> 571 job = self._backend.apply_async(batch, callback=cb) 572 self._jobs.append(job) 573
~/miniconda3/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py in apply_async(self, func, callback) 107 def apply_async(self, func, callback=None): 108 """Schedule a func to be run""" --> 109 result = ImmediateResult(func) 110 if callback: 111 callback(result)
~/miniconda3/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py in __init__(self, batch) 324 # Don't delay the application, to avoid keeping the input 325 # arguments in memory --> 326 self.results = batch() 327 328 def get(self):
~/miniconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self) 129 130 def __call__(self): --> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items] 132 133 def __len__(self):
~/miniconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0) 129 130 def __call__(self): --> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items] 132 133 def __len__(self):
~/miniconda3/lib/python3.6/site-packages/sklearn/linear_model/randomized_l1.py in _randomized_lasso(X, y, weights, mask, alpha, verbose, precompute, eps, max_iter) 171 copy_Gram=False, alpha_min=np.min(alpha), 172 method='lasso', verbose=verbose, --> 173 max_iter=max_iter, eps=eps) 174 175 if len(alpha) > 1:
~/miniconda3/lib/python3.6/site-packages/sklearn/linear_model/least_angle.py in lars_path(X, y, Xy, Gram, max_iter, alpha_min, method, copy_X, eps, copy_Gram, verbose, return_path, return_n_iter, positive) 442 443 # TODO: this could be updated --> 444 residual = y - np.dot(X[:, :n_active], coef[active]) 445 temp = np.dot(X.T[n_active], residual) 446
ValueError: shapes (49,17) and (16,) not aligned: 17 (dim 1) != 16 (dim 0)
Versions: Linux-4.10.0-32-generic-x86_64-with-debian-stretch-sid Python 3.6.1 |Continuum Analytics, Inc.| (default, May 11 2017, 13:09:58) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] NumPy 1.12.1 SciPy 0.19.1 Scikit-Learn 0.18.2
Happy to provide any additional information if needed.
Lastly, I get a message that RandomizedLasso with be deprecated - what will replace it’s functionality? Setting n_jobs parameter to anything beyond 1 breaks the code, which was already reported - hope the replacement will fix that.
Many thanks!
Issue Analytics
- State:
- Created 6 years ago
- Comments:28 (13 by maintainers)
Top GitHub Comments
Came across the same problem, and it was always with multiple drops. I changed https://github.com/scikit-learn/scikit-learn/blob/b7c41636907defd0ca210ed2e8e17fd4735567a0/sklearn/linear_model/least_angle.py#L701 To include “n_active -= 1” within the preceding loop and put https://github.com/scikit-learn/scikit-learn/blob/b7c41636907defd0ca210ed2e8e17fd4735567a0/sklearn/linear_model/least_angle.py#L733-L734 into another loop: for ii in drop_idx: temp = Cov_copy[ii] - np.dot(Gram_copy[ii], coef) Cov = np.r_[temp, Cov]
Let me use it again without errors (as part of Autofeat). Hope this helps someone, or that someone sees why it shouldn’t be done.
@adrinjalali as you recommended, I created a minimal reproducible example: https://github.com/FelixNeutatz/LassoLarsCVBug