question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

check_array forces finite values for non-numeric data

See original GitHub issue

Description

When check_array is given dtype=None, it still tries to force finite values.

Steps/Code to Reproduce

import numpy  as np
import pandas as pd
from sklearn.utils import check_array

x = pd.DataFrame({
    'val': [
        np.zeros((1, 2))
    ]
}).values

check_array(x, dtype=None)
check_array(x, dtype=object) # This fails too.

Expected Results

No error is thrown. Values are not checked for finiteness. or No error is throw. Values are checked for finiteness.

Actual Results

 File "error.py", line 14, in <module>
    check_X_y(x, y, dtype=None)#, force_all_finite=False)
  File ".../sklearn/utils/validation.py", line 719, in check_X_y
    estimator=estimator)
  File ".../sklearn/utils/validation.py", line 542, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File ".../sklearn/utils/validation.py", line 59, in _assert_all_finite
    if _object_dtype_isnan(X).any():
AttributeError: 'bool' object has no attribute 'any'

Versions

System:
    python: 3.7.3 (default, Mar 27 2019, 22:11:17)  [GCC 7.3.0]
executable: /opt/anaconda3/bin/python
   machine: Linux-3.10.0-957.21.3.el7.x86_64-x86_64-with-centos-7.6.1810-Core

BLAS:
    macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
  lib_dirs: /opt/anaconda3/lib
cblas_libs: mkl_rt, pthread

Python deps:
       pip: 19.1.1
setuptools: 41.0.1
   sklearn: 0.21.3
     numpy: 1.16.4
     scipy: 1.3.0
    Cython: 0.29.12
    pandas: 0.24.2
Linux-3.10.0-957.21.3.el7.x86_64-x86_64-with-centos-7.6.1810-Core
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0]
NumPy 1.16.4
SciPy 1.3.0
Scikit-Learn 0.21.2

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:8 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
glemaitrecommented, Sep 23, 2019

If I take reported in imbalanced-learn, I think that this our implementation which fails. We just need to turn the flag off.

Since that our primary use case of object dtype is string, I think that we the current defaults are OK.

0reactions
rthcommented, Sep 23, 2019

check_array([‘xxx’, ‘yyy’, np.nan], dtype=object, ensure_2d=False)

That behaves consistently IMO, i.e. raises an error on NaN since force_all_finite=True, and doing that by default in a string array makes sense I think.

The question is whether we need to change the behavior for more complex object dtypes – seems likely.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Detect if a NumPy array contains at least one non-numeric ...
If infinity is a possible value, I would use numpy.isfinite ... iterates over the whole data, regardless of whether a nan is found ......
Read more >
Convert non numeric values to numeric ones (destring, or ...
Deal all, I would like to convert some non-numeric observations in x1 (numbers not recognized as numbers) to numeric ones.
Read more >
Non-numeric data frame in rqEval - Oracle Communities
In your example, 'select 1 "Industry" forces a returned numeric value. It fails because the data returned contains 1 non-numeric column.
Read more >
Find Non-Numeric Values in R (Example) - Statistics Globe
In this post, I'll illustrate how to identify non-numeric values in a vector or a data frame column in the R programming language....
Read more >
Does excel have a funciton to exclude nonnumeric data from
So, if you have X values in A2:A20 and Y values in B2:B20 but some rows have non-numeric data IN BOTH RANGES then...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found