Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inverse Transform for Label Encoder with mixed strings and numbers returns only strings

See original GitHub issue

Description

With a LabelEncoder fitted with both string and numeric values, the inverse transform of that LabelEncoder will include only strings.

Steps/Code to Reproduce

from sklearn.preprocessing import LabelEncoder
le=LabelEncoder().fit([1, 2, 'a', 'b'])
le.inverse_transform([0, 1, 2, 3])

Expected Results

array([1, 2, 'a', 'b'], dtype=object)

I understand that numpy is not ideal for dealing with non-numeric data, so I don’t know what dtype the output SHOULD be, but I know that if the dtype is simply “object”, then it will differentiate between the strings and numbers.

Actual Results

array('1', '2', 'a', 'b'], dtype='<U11')

The array is no longer mixed type.

Versions

Windows-8.1-6.3.9600-SP0 Python 3.5.3 |Anaconda 4.4.0 (64-bit)| (default, May 15 2017, 10:43:23) [MSC v.1900 64 bit (AMD64)] NumPy 1.12.1 SciPy 0.19.0 Scikit-Learn 0.18.1

Issue Analytics

State:
Created 6 years ago
Comments:23 (16 by maintainers)

Top GitHub Comments

1reaction

jnothmancommented, Nov 5, 2018

Informative warnings are usually very welcome!

0reactions

amuellercommented, Jun 6, 2020

also see #17294

Top Results From Across the Web

sklearn.LabelEncoder with never seen before values

LabelEncoder has been fitted on a training set, it might break if it encounters new values when used on a test set. The...

sklearn.preprocessing.LabelEncoder

LabelEncoder can be used to normalize labels. It can also be used to transform non-numerical labels (as long as they are hashable and...

Label Encoding on multiple columns - Kaggle

Can Label Encoding be applied on multiple columns in a dataset and how ? ... perform inverse transform. if you use the same...

LabelEncoder Example - Single & Multiple Columns

Many machine learning algorithms require the categorical data (labels) to be converted or encoded in the numerical or number form. Ordinal ...

strings as features in decision tree/random forest

The decision trees implemented in scikit-learn uses only numerical features and these features are interpreted always as continuous numeric variables. Thus, ...