OneHotEncoder - categories parameter not working as deprecated n_values parameter
See original GitHub issueWhen using the ‘n_values’ parameter, the code works as expected and I get the deprecation warning. However, when the code is changed to what’s suggested in deprecation warning, getting a ValueError. This works:
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder(n_values= 8,sparse = False)
o = ohe.fit_transform(np.array([[3, 5, 1]]))
o = o.reshape(1,3,8)
print(o)
[[[0. 0. 0. 1. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 1. 0. 0.] [0. 1. 0. 0. 0. 0. 0. 0.]]] C:\Anaconda3\lib\site-packages\sklearn\preprocessing\_encoders.py:331: DeprecationWarning: Passing 'n_values' is deprecated in version 0.20 and will be removed in 0.22. You can use the 'categories' keyword instead. 'n_values=n' corresponds to 'categories=[range(n)]'. warnings.warn(msg, DeprecationWarning)
When I remove the n_values with categories = [range(n)] and run code as below, I get the error:
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder(categories = [range(8)],sparse = False)
o = ohe.fit_transform(np.array([[3, 5, 1]]))
o = o.reshape(1,3,8)
print(o)
``ValueError Traceback (most recent call last) <ipython-input-10-2c07126f3194> in <module> 2 import pandas as pd 3 ohe = OneHotEncoder(categories = [range(8)],sparse = False) ----> 4 o = ohe.fit_transform(np.array([[3, 5, 1]])) 5 # pd.get_dummies(np.array([3, 5, 1])) 6 o.reshape(1,3,8)
C:\Anaconda3\lib\site-packages\sklearn\preprocessing_encoders.py in fit_transform(self, X, y) 516 self._categorical_features, copy=True) 517 else: –> 518 return self.fit(X).transform(X) 519 520 def _legacy_transform(self, X):
C:\Anaconda3\lib\site-packages\sklearn\preprocessing_encoders.py in fit(self, X, y) 427 return self 428 else: –> 429 self._fit(X, handle_unknown=self.handle_unknown) 430 return self 431
C:\Anaconda3\lib\site-packages\sklearn\preprocessing_encoders.py in _fit(self, X, handle_unknown) 70 “supported for numerical categories”) 71 if len(self._categories) != n_features: —> 72 raise ValueError(“Shape mismatch: if n_values is an array,” 73 " it has to be of shape (n_features,).") 74
ValueError: Shape mismatch: if n_values is an array, it has to be of shape (n_features,).``
Is this a usage error or a bug?
System: python: 3.7.1 | packaged by conda-forge | (default, Nov 13 2018, 19:01:41) [MSC v.1900 64 bit (AMD64)] executable: C:\Anaconda3\python.exe machine: Windows-10-10.0.16299-SP0
BLAS: macros: lib_dirs: cblas_libs: cblas
Python deps: pip: 18.1 setuptools: 40.6.3 sklearn: 0.20.3 numpy: 1.15.4 scipy: 1.1.0 Cython: 0.29.2 pandas: 0.23.4
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (5 by maintainers)
Something like this worked for me.
Here I am giving 0 and 1 for a column named
countries
.Hi, i am trying to implement this below link
but at line number 259 i.e.
i am getting this below error
can anyone please help me how i can get rid of this.