check_array forces finite values for non-numeric data
See original GitHub issueDescription
When check_array
is given dtype=None
, it still tries to force finite values.
Steps/Code to Reproduce
import numpy as np
import pandas as pd
from sklearn.utils import check_array
x = pd.DataFrame({
'val': [
np.zeros((1, 2))
]
}).values
check_array(x, dtype=None)
check_array(x, dtype=object) # This fails too.
Expected Results
No error is thrown. Values are not checked for finiteness. or No error is throw. Values are checked for finiteness.
Actual Results
File "error.py", line 14, in <module>
check_X_y(x, y, dtype=None)#, force_all_finite=False)
File ".../sklearn/utils/validation.py", line 719, in check_X_y
estimator=estimator)
File ".../sklearn/utils/validation.py", line 542, in check_array
allow_nan=force_all_finite == 'allow-nan')
File ".../sklearn/utils/validation.py", line 59, in _assert_all_finite
if _object_dtype_isnan(X).any():
AttributeError: 'bool' object has no attribute 'any'
Versions
System:
python: 3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0]
executable: /opt/anaconda3/bin/python
machine: Linux-3.10.0-957.21.3.el7.x86_64-x86_64-with-centos-7.6.1810-Core
BLAS:
macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
lib_dirs: /opt/anaconda3/lib
cblas_libs: mkl_rt, pthread
Python deps:
pip: 19.1.1
setuptools: 41.0.1
sklearn: 0.21.3
numpy: 1.16.4
scipy: 1.3.0
Cython: 0.29.12
pandas: 0.24.2
Linux-3.10.0-957.21.3.el7.x86_64-x86_64-with-centos-7.6.1810-Core
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0]
NumPy 1.16.4
SciPy 1.3.0
Scikit-Learn 0.21.2
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (6 by maintainers)
Top Results From Across the Web
Detect if a NumPy array contains at least one non-numeric ...
If infinity is a possible value, I would use numpy.isfinite ... iterates over the whole data, regardless of whether a nan is found ......
Read more >Convert non numeric values to numeric ones (destring, or ...
Deal all, I would like to convert some non-numeric observations in x1 (numbers not recognized as numbers) to numeric ones.
Read more >Non-numeric data frame in rqEval - Oracle Communities
In your example, 'select 1 "Industry" forces a returned numeric value. It fails because the data returned contains 1 non-numeric column.
Read more >Find Non-Numeric Values in R (Example) - Statistics Globe
In this post, I'll illustrate how to identify non-numeric values in a vector or a data frame column in the R programming language....
Read more >Does excel have a funciton to exclude nonnumeric data from
So, if you have X values in A2:A20 and Y values in B2:B20 but some rows have non-numeric data IN BOTH RANGES then...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
If I take reported in
imbalanced-learn
, I think that this our implementation which fails. We just need to turn the flag off.Since that our primary use case of object dtype is string, I think that we the current defaults are OK.
That behaves consistently IMO, i.e. raises an error on NaN since
force_all_finite=True
, and doing that by default in a string array makes sense I think.The question is whether we need to change the behavior for more complex object dtypes – seems likely.