Improve MinCovDet.fit error when covariance is zero
See original GitHub issueOk this is extremely weird, can someone run this code and see if it crashes with that error?
import numpy as np
from sklearn.covariance import MinCovDet
clf = MinCovDet()
data = np.array([0.5, 0.1, 0.1, 0.1, 0.957, 0.1, 0.1,
0.1, 0.4285, 0.1]).reshape(-1, 1)
clf.fit(data)
If I change the array to this
data = np.array([0.5, 0.11, 0.1, 0.1, 0.957, 0.1, 0.1,
0.1, 0.4285, 0.1]).reshape(-1, 1)
Then it runs fine
But it seems to crash with any array where there are too many of the same values
This array crashes as well
data = np.array([0.5, 0.3, 0.3, 0.3, 0.957, 0.3, 0.3,
0.3, 0.4285, 0.3]).reshape(-1, 1)
I already checked for NANs and everything, there’s nothing
Using Python 3.6.2 Pandas 0.20.3 Numpy 1.13.1 scikit-learn 0.19.0
Thanks
Issue Analytics
- State:
- Created 6 years ago
- Comments:10 (7 by maintainers)
Top Results From Across the Web
sklearn.covariance.MinCovDet
Compute the Mean Squared Error between two covariance estimators. fit (X[, y]). Fit a Minimum Covariance Determinant with the FastMCD algorithm.
Read more >2.6. Covariance Estimation - Scikit-learn - W3cubDocs
In their 2004 paper [1], O. Ledoit and M. Wolf propose a formula to compute the optimal shrinkage coefficient \(\alpha\) that minimizes the...
Read more >sklearn.covariance.MinCovDet.fit Example - Program Talk
def test_mcd_support_covariance_is_zero(): # Check that MCD returns a ValueError with informative message when the # covariance of the support data is equal ...
Read more >scikit-learn minCovDet Input contains NaN, infinity or a value ...
When I run your code, it gives me a Runtime Warning: divide by zero encountered in true_divide and another one RuntimeWarning: invalid value ......
Read more >Outlier Detection — Applied Machine Learning in Python
Fit robust covariance matrix and mean FIXME add slide on Covariance: ... req, **http_conn_args) 1318 except OSError as err: # timeout error 1319...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
In this case, the estimated covariance matrix of the support data is equal to 0 and therefore the determinant is equal to 0. We thus have the minimum covariance Determinant and the algorithm should stop as explained in the original paper. The covariance is equal to 0 in your case because the support data are the ones with the same values… If there are fewer ties you have different values in the support data and everything works fine. If you don’t want the robust covariance to be estimated by 0 you may want to increase the
support_fraction
parameter to increase the number of support data.To be honest, just saying “det(cov) = 0, try to increase support_fraction” may be a good enough error message. PR more than welcome!