Inverse Transform for Label Encoder with mixed strings and numbers returns only strings
See original GitHub issueDescription
With a LabelEncoder fitted with both string and numeric values, the inverse transform of that LabelEncoder will include only strings.
Steps/Code to Reproduce
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder().fit([1, 2, 'a', 'b'])
le.inverse_transform([0, 1, 2, 3])
Expected Results
array([1, 2, 'a', 'b'], dtype=object)
I understand that numpy is not ideal for dealing with non-numeric data, so I don’t know what dtype the output SHOULD be, but I know that if the dtype is simply “object”, then it will differentiate between the strings and numbers.
Actual Results
array('1', '2', 'a', 'b'], dtype='<U11')
The array is no longer mixed type.
Versions
Windows-8.1-6.3.9600-SP0 Python 3.5.3 |Anaconda 4.4.0 (64-bit)| (default, May 15 2017, 10:43:23) [MSC v.1900 64 bit (AMD64)] NumPy 1.12.1 SciPy 0.19.0 Scikit-Learn 0.18.1
Issue Analytics
- State:
- Created 6 years ago
- Comments:23 (16 by maintainers)
Top Results From Across the Web
sklearn.LabelEncoder with never seen before values
LabelEncoder has been fitted on a training set, it might break if it encounters new values when used on a test set. The...
Read more >sklearn.preprocessing.LabelEncoder
LabelEncoder can be used to normalize labels. It can also be used to transform non-numerical labels (as long as they are hashable and...
Read more >Label Encoding on multiple columns - Kaggle
Can Label Encoding be applied on multiple columns in a dataset and how ? ... perform inverse transform. if you use the same...
Read more >LabelEncoder Example - Single & Multiple Columns
Many machine learning algorithms require the categorical data (labels) to be converted or encoded in the numerical or number form. Ordinal ...
Read more >strings as features in decision tree/random forest
The decision trees implemented in scikit-learn uses only numerical features and these features are interpreted always as continuous numeric variables. Thus, ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Informative warnings are usually very welcome!
also see #17294