ARDRegression still crashes when trained on some constant y
See original GitHub issueDescription
Recently, it was reported in #10092 that BayesRidge
and ARDRegression
have an issue with training on constant labels. There was a fix made in #10095. It seems it is not completely fixed. I have found some specific examples under which it fails, although it works fine for many examples.
Steps/Code to Reproduce
Download the pickle file here: https://www.dropbox.com/s/ytb7y2o4ij8kwbu/ard_bug_data.pickle?dl=0 The following code will error out:
import pickle
from sklearn.linear_model import ARDRegression
with open('ard_bug_data.pickle', 'rb') as f:
Xtr, ytr, Xts = pickle.load(f)
ard = ARDRegression()
ard.fit(Xtr, ytr)
mus, stds = ard.predict(Xts, return_std=True)
Expected Results
No error is thrown. mus
is a vector of of 97 values that are in ytr
. stds
should likely be zeros, but I am not sure.
Actual Results
The error is as follows:
Traceback (most recent call last):
File "<ipython-input-8-acf07ddd7964>", line 10, in <module>
ard.predict(Xts, return_std=True)
File "c:\users\sergey\github\scikit-learn\sklearn\linear_model\bayes.py", line 540, in predict
sigmas_squared_data = (np.dot(X, self.sigma_) * X).sum(axis=1)
ValueError: shapes (97,0) and (1,1) not aligned: 0 (dim 1) != 1 (dim 0)
Versions
Windows-10-10.0.15063-SP0
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 12:30:02) [MSC v.1900 64 bit (AMD64)]
NumPy 1.13.3
SciPy 0.19.1
Scikit-Learn 0.20.dev0
cc @glemaitre
Issue Analytics
- State:
- Created 6 years ago
- Comments:28 (28 by maintainers)
Top Results From Across the Web
sklearn.linear_model.ARDRegression
fit (X, y). Fit the model according to the given training data and parameters. ; get_params ([deep]). Get parameters for this estimator. ;...
Read more >APIs — AutoSklearn 0.15.0 documentation
Fit auto-sklearn to given training set (X, y). ... A constant from the module autosklearn.constants . ... number of crashed target algorithm runs....
Read more >scikit-learn user guide - Math-Unipd
some constant c such that the average L2 norm of the training data equals one. References: • “Efficient BackProp” Y. LeCun, L. Bottou, ......
Read more >A Bayesian ridge regression analysis of congestion's impact ...
In the U.S., an urban Florida study reported higher average crash events ... the bias and still solves the problem of multicollinearity.
Read more >scikit-learn user guide
1.2.15 Why do I sometime get a crash/freeze with n_jobs > 1 under OSX ... data believed to be predictive in some way),...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
So in the current code, the only thing missing is to update
coef_
andsigma_
once that the for loop is finished.So I would probably move the estimate of coef_ and lambda_ into a function and call it in the for loop and once after the for.
I also so that we have a different criterion to stop the iteration. In the code of tipping it would correspond to:
Not sure what is the best.