check_array(X, dtype='numeric') should fail if X has strings
See original GitHub issueCurrently, dtype=‘numeric’ is defined as “dtype is preserved unless array.dtype is object”. This seems overly lenient and strange behaviour, as in #9342 where @qinhanmin2014 shows that check_array(['a', 'b', 'c'], dtype='numeric')
works without error and produces an array of strings! This behaviour is not tested and it’s hard to believe that it is useful and intended. Perhaps we need a deprecation cycle, but I think dtype=‘numeric’ should raise an error, or attempt to coerce, if the data does not actually have a numeric, real-valued dtype.
Issue Analytics
- State:
- Created 6 years ago
- Comments:14 (11 by maintainers)
Top Results From Across the Web
sklearn.utils.check_array — scikit-learn 1.2.0 documentation
The data name used to construct the error message. In particular if input_name is “X” and the data has NaN values and allow_nan...
Read more >check_array() got an unexpected keyword argument ...
so I'm trying to do Co-clustering Mod for my data, here is the code: ... 92 ---> 93 check_array(X, accept_sparse=True, dtype="numeric", ...
Read more >Source code for econml.sklearn_extensions.linear_model
This is necessary for their get_params to play nicely with some other ... Will be cast to X's dtype if necessary sample_weight :...
Read more >Python sklearn.utils.check_array() Examples
X = check_array(X, accept_sparse=["csr", "csc"]) if self.metric == "precomputed": check_is_fitted(self, "medoid_indices_") return X[:, self.medoid_indices_] ...
Read more >sklearn.utils.check_array() - Scikit-learn - W3cubDocs
New in version 0.20: force_all_finite accepts the string 'allow-nan' . ensure_2d : boolean (default=True). Whether to raise a value error if X is...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m confused about the fix here. I’m using sklearn 0.23.2, and the behavior that @jnothman called out as a problem is still the same as he described. To reproduce:
Now everything’s a string. It looks like the warning message was added in 2018 but the behavior was never changed. Am I missing something?
We wouldn’t deprecate
check_array
entirely, but we would warn for two releases that “In the future, this data with dtype(‘Uxx’) would be rejected because it is not of a numeric dtype.”