question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inverse Transform for Label Encoder with mixed strings and numbers returns only strings

See original GitHub issue

Description

With a LabelEncoder fitted with both string and numeric values, the inverse transform of that LabelEncoder will include only strings.

Steps/Code to Reproduce

from sklearn.preprocessing import LabelEncoder
le=LabelEncoder().fit([1, 2, 'a', 'b'])
le.inverse_transform([0, 1, 2, 3])

Expected Results

array([1, 2, 'a', 'b'], dtype=object)

I understand that numpy is not ideal for dealing with non-numeric data, so I don’t know what dtype the output SHOULD be, but I know that if the dtype is simply “object”, then it will differentiate between the strings and numbers.

Actual Results

array('1', '2', 'a', 'b'], dtype='<U11')

The array is no longer mixed type.

Versions

Windows-8.1-6.3.9600-SP0 Python 3.5.3 |Anaconda 4.4.0 (64-bit)| (default, May 15 2017, 10:43:23) [MSC v.1900 64 bit (AMD64)] NumPy 1.12.1 SciPy 0.19.0 Scikit-Learn 0.18.1

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:23 (16 by maintainers)

github_iconTop GitHub Comments

1reaction
jnothmancommented, Nov 5, 2018

Informative warnings are usually very welcome!

0reactions
amuellercommented, Jun 6, 2020

also see #17294

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.LabelEncoder with never seen before values
LabelEncoder has been fitted on a training set, it might break if it encounters new values when used on a test set. The...
Read more >
sklearn.preprocessing.LabelEncoder
LabelEncoder can be used to normalize labels. It can also be used to transform non-numerical labels (as long as they are hashable and...
Read more >
Label Encoding on multiple columns - Kaggle
Can Label Encoding be applied on multiple columns in a dataset and how ? ... perform inverse transform. if you use the same...
Read more >
LabelEncoder Example - Single & Multiple Columns
Many machine learning algorithms require the categorical data (labels) to be converted or encoded in the numerical or number form. Ordinal ...
Read more >
strings as features in decision tree/random forest
The decision trees implemented in scikit-learn uses only numerical features and these features are interpreted always as continuous numeric variables. Thus, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found