question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ColumnTransformer not dropping string columns after encoding without Pipeline

See original GitHub issue

Description

ColumnTransformer works properly when the transformer is a Pipeline, but not if its a list of estimators.

Steps/Code to Reproduce

df = pd.DataFrame({'a': ['v1', 'v2'], 
                   'b': ['v1', 'v2'], 
                   'c': [1, 2]})
cols = ['a', 'b']

imp = ('imp', SimpleImputer(strategy='constant'), cols)
ohe = ('ohe', OneHotEncoder(sparse=False), cols)
transformers = [imp, ohe]
ct = ColumnTransformer(transformers)
ct.fit_transform(df)

Expected Results

array([[1., 0., 1., 0.],
       [0., 1., 0., 1.]])

Actual Results

array([['v1', 'v1', 1.0, 0.0, 1.0, 0.0],
       ['v2', 'v2', 0.0, 1.0, 0.0, 1.0]], dtype=object)

Correct results produced with a Pipeline

imp2 = ('imp', SimpleImputer(strategy='constant'))
ohe2 = ('ohe', OneHotEncoder(sparse=False))
steps = [imp2, ohe2]
pipe = Pipeline(steps)
transformers2 = [('cat', pipe, cols)]
ct = ColumnTransformer(transformers2)
ct.fit_transform(df)

Versions

Darwin-17.7.0-x86_64-i386-64bit Python 3.6.4 |Anaconda custom (64-bit)| (default, Mar 12 2018, 20:05:31) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] NumPy 1.14.3 SciPy 1.1.0 Scikit-Learn 0.20rc1

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
jnothmancommented, Sep 6, 2018

“the features generated by each transformer will be concatenated to form the output feature matrix”

0reactions
LilyX2021commented, Sep 29, 2018

I’m working on this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ColumnTransformer & Pipeline with OHE - Stack Overflow
I believe this remainder= is not relevant to the field being OneHot Encoded. I would like to know how is the OHE field...
Read more >
How to Use the ColumnTransformer for Data Preparation
Applying data transforms like scaling or encoding categorical ... Any columns not specified in the list of “transformers” are dropped from ...
Read more >
Column Transformer with Mixed Types - Scikit-learn
This example illustrates how to apply different preprocessing and feature extraction pipelines to different subsets of features, using ColumnTransformer.
Read more >
5. Preprocessing Categorical Features and Column Transformer
The pipeline does not like the categorical column. scikit-learn only ... Now we see that after one-hot encoding we only get a single...
Read more >
Passthrough some columns and drop others in a ... - YouTube
In a ColumnTransformer, you can use the strings 'passthrough' and ' drop ' in place of a transformer. Useful if you need to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found