validation check for precomputed gram matrix fails erroneously when using float32 data
See original GitHub issueDescribe the bug
A validation check for the precomputed gram matrix has been introduced in version 1.0.0 (https://github.com/scikit-learn/scikit-learn/pull/19004).
This check sometimes misleadingly fails when the matrix has dtype float32 and the arbitrary selected feature columns are sparse.
Code snippet to reproduce attached.
I could add a pr in the following days to fix that if wanted.
Steps/Code to Reproduce
from sklearn.linear_model import LassoCV
import numpy as np
m = LassoCV()
np.random.seed(seed=3)
X = np.random.random((10000, 50)).astype(np.float32)
X[:, 25] = np.where(X[:, 25] < 0.98, 0, 1)
X[:, 26] = np.where(X[:, 26] < 0.98, 0, 1)
y = np.random.random((10000, 1)).astype(np.float32)
m.fit(X, y)
Expected Results
No Exception thrown
Actual Results
ValueError: Gram matrix passed in via ‘precompute’ parameter did not pass validation when a single element was checked - please check that it was computed properly. For element (25,26) we computed -0.4163646101951599 but the user-supplied value was -0.41635191440582275.
Versions
1.0.1
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
Fitting an Elastic Net with a precomputed Gram Matrix and ...
The following example shows how to precompute the gram matrix while using weighted samples with an ElasticNet. If weighted samples are used, the...
Read more >Nested cross-validation in grid search for precomputed ...
The scikit learn doc says: Set kernel='precomputed' and pass the Gram matrix instead of X in the fit method. At the moment, the...
Read more >Source code for econml.sklearn_extensions.linear_model
For linear models, weights are applied as reweighting of the data matrix X and ... Whether to use a precomputed Gram matrix to...
Read more >Support Vector Machines
Define the kernel by either giving the kernel as a python function or by precomputing the Gram matrix. Args: X1: array X2: array...
Read more >scikit-learn 0.16.1 documentation
Scalable approximate nearest neighbors search with Locality-sensitive ... Improved error messages and better validation when using malformed input data.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I created a pr to fix this issue: https://github.com/scikit-learn/scikit-learn/pull/22008 The check of gram matrix is unnecessary in case it has not been provided by the user, but calculated by coordinate descent itself. Additionally it would make sense to increase tolerance values when the dtype of the matrices are float32.
@QuantHao #22208 is stalled as there is no test added. Can you take over the PR and add the necessary tests so we can consider merging? 🙏