pandas dtypes "boolean" not supported in classification target
See original GitHub issueDescribe the bug
Pandas has extended NumPy’s dtypes and these dtypes extensions are not all supported as targets in a sklearn classifier. In particular, if the target y is a Pandas “boolean” dtype, a classifier such as LogisticRegression fails whereas if the target is a numpy “bool” dtype, the classifier will not fail.
Steps/Code to Reproduce
import pandas as pd
from sklearn.linear_model import LogisticRegression
df = pd.DataFrame({"boolean_col": pd.Series([False, True, False, True], dtype="boolean"),
"bool_col": pd.Series([False, True, False, True], dtype="bool"),
"num_col": pd.Series([1, 2, 3, 4])})
clf = LogisticRegression()
Expected Results
clf.fit(df[["num_col"]], df.bool_col)
--------------------------------------------------------------------------------------------------------------
LogisticRegression()
Actual Results
clf.fit(df[["num_col"]], df.boolean_col)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-9a0c5abf260e> in <module>
----> 1 clf.fit(df[["num_col"]], df.boolean_col)
~\AppData\Local\Continuum\anaconda3\envs\lognode\lib\site-packages\sklearn\linear_model\_logistic.py in fit(self, X, y, sample_weight)
1343 order="C",
1344 accept_large_sparse=solver != 'liblinear')
-> 1345 check_classification_targets(y)
1346 self.classes_ = np.unique(y)
1347
~\AppData\Local\Continuum\anaconda3\envs\lognode\lib\site-packages\sklearn\utils\multiclass.py in check_classification_targets(y)
170 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
171 'multilabel-indicator', 'multilabel-sequences']:
--> 172 raise ValueError("Unknown label type: %r" % y_type)
173
174
ValueError: Unknown label type: 'unknown'
Versions
System: python: 3.8.2 (default, Apr 14 2020, 19:01:40) [MSC v.1916 64 bit (AMD64)] executable: ~\AppData\Local\Continuum\anaconda3\envs\lognode\python.exe machine: Windows-10-10.0.18362-SP0
Python dependencies: pip: 20.0.2 setuptools: 46.1.3.post20200330 sklearn: 0.23.1 numpy: 1.18.1 scipy: 1.4.1 Cython: None pandas: 1.0.3 matplotlib: 3.2.1 joblib: 0.14.1 threadpoolctl: 2.1.0
Built with OpenMP: True
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
What happened to python's ~ when working with boolean?
So pandas will make each columns only have one dtype , if not it will convert to object . After T what data...
Read more >valueerror: dataframe.dtypes for data must be int, float, bool or ...
In short, LightGBM is not compatible with "Object" type with pandas DataFrame, so you need to encode to "int, float or bool" by...
Read more >Classification — pycaret 2.3.5 documentation - Read the Docs
Setup function must be called before executing any other function. It takes two mandatory parameters: data and target . All the other parameters...
Read more >How do I select a subset of a DataFrame?
The inner square brackets define a Python list with column names, ... 886 False 887 False 888 False 889 False 890 False Name:...
Read more >cuML API Reference — cuml 22.10.00 documentation
This feature is not fully supported by cupy yet, causing incorrect values when ... For pandas' dataframes with nullable integer dtypes with missing...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I can confirm it’s still present in 0.24.2
Still failing with version ‘1.0.1’ when using
precision_recall_curve
with boolean data.I debugged and found that in
utils.multiclass.type_of_target
something goes wrong. (See comment earlier.) There is still ay = np.asarray(y)
, which is the cause?