question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ARDRegression still crashes when trained on some constant y

See original GitHub issue

Description

Recently, it was reported in #10092 that BayesRidge and ARDRegression have an issue with training on constant labels. There was a fix made in #10095. It seems it is not completely fixed. I have found some specific examples under which it fails, although it works fine for many examples.

Steps/Code to Reproduce

Download the pickle file here: https://www.dropbox.com/s/ytb7y2o4ij8kwbu/ard_bug_data.pickle?dl=0 The following code will error out:

import pickle
from sklearn.linear_model import ARDRegression

with open('ard_bug_data.pickle', 'rb') as f:
    Xtr, ytr, Xts = pickle.load(f)
    
ard = ARDRegression()
ard.fit(Xtr, ytr)
mus, stds = ard.predict(Xts, return_std=True)

Expected Results

No error is thrown. mus is a vector of of 97 values that are in ytr. stds should likely be zeros, but I am not sure.

Actual Results

The error is as follows:

Traceback (most recent call last):

  File "<ipython-input-8-acf07ddd7964>", line 10, in <module>
    ard.predict(Xts, return_std=True)

  File "c:\users\sergey\github\scikit-learn\sklearn\linear_model\bayes.py", line 540, in predict
    sigmas_squared_data = (np.dot(X, self.sigma_) * X).sum(axis=1)

ValueError: shapes (97,0) and (1,1) not aligned: 0 (dim 1) != 1 (dim 0)

Versions

Windows-10-10.0.15063-SP0
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 12:30:02) [MSC v.1900 64 bit (AMD64)]
NumPy 1.13.3
SciPy 0.19.1
Scikit-Learn 0.20.dev0

cc @glemaitre

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:28 (28 by maintainers)

github_iconTop GitHub Comments

1reaction
glemaitrecommented, Nov 20, 2017

So in the current code, the only thing missing is to update coef_ and sigma_ once that the for loop is finished.

So I would probably move the estimate of coef_ and lambda_ into a function and call it in the for loop and once after the for.

I also so that we have a different criterion to stop the iteration. In the code of tipping it would correspond to:

np.max(np.abs(np.log(lambda_[keep_lambda]) - np.log(old_lambda_[keep_lambda]))

Not sure what is the best.

1reaction
glemaitrecommented, Nov 16, 2017
                                                                                  Naively ‎I would do that but I would also check the original paper stated in the user guide to know that we don't mess up. @agramfort will for sure know better than me the algorithm. 
Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.linear_model.ARDRegression
fit (X, y). Fit the model according to the given training data and parameters. ; get_params ([deep]). Get parameters for this estimator. ;...
Read more >
APIs — AutoSklearn 0.15.0 documentation
Fit auto-sklearn to given training set (X, y). ... A constant from the module autosklearn.constants . ... number of crashed target algorithm runs....
Read more >
scikit-learn user guide - Math-Unipd
some constant c such that the average L2 norm of the training data equals one. References: • “Efficient BackProp” Y. LeCun, L. Bottou, ......
Read more >
A Bayesian ridge regression analysis of congestion's impact ...
In the U.S., an urban Florida study reported higher average crash events ... the bias and still solves the problem of multicollinearity.
Read more >
scikit-learn user guide
1.2.15 Why do I sometime get a crash/freeze with n_jobs > 1 under OSX ... data believed to be predictive in some way),...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found