Consistent ColumnTransformer for intersecting lists of columns
See original GitHub issueI want to use sklearn.compose.ColumnTransformer
consistently (not parallel, so, the second transformer should be executed only after the first) for intersecting lists of columns in this way:
log_transformer = p.FunctionTransformer(lambda x: np.log(x))
df = pd.DataFrame({'a': [1,2, np.NaN, 4], 'b': [1,np.NaN, 3, 4], 'c': [1 ,2, 3, 4]})
compose.ColumnTransformer(n_jobs=1,
transformers=[
('num', impute.SimpleImputer() , ['a', 'b']),
('log', log_transformer, ['b', 'c']),
('scale', p.StandardScaler(), ['a', 'b', 'c'])
]).fit_transform(df)
So, I want to use SimpleImputer
for 'a'
, 'b'
, then log
for 'b'
, 'c'
, and then StandardScaler
for 'a'
, 'b'
, 'c'
.
But:
- I get array of
(4, 7)
shape. - I still get
Nan
ina
andb
columns.
So, how can I use ColumnTransformer
for different columns in the manner of Pipeline
?
P.S. please note that I already asked a question on SO.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:16 (7 by maintainers)
Top Results From Across the Web
Consistent ColumnTransformer for intersecting lists of columns
I want to use sklearn.compose.ColumnTransformer consistently (not parallel, so, the second transformer should ...
Read more >Using ColumnTransformer to combine data processing steps
ColumnTransformers come in handy when you are creating a data pipeline where different columns need different transformations.
Read more >sklearn.compose.ColumnTransformer
Applies transformers to columns of an array or pandas DataFrame. This estimator allows different columns or column subsets of the input to be...
Read more >Screening hidden columns of pd.describe() lists-Pandas,Python
... Consistent ColumnTransformer for intersecting lists of columns · Run a function for each element in two lists in Pandas Dataframe Columns ...
Read more >Extracting, transforming and selecting features - Apache Spark
VectorAssembler is a transformer that combines a given list of columns into a single vector column. It is useful for combining raw features...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@justmarkham thanks, it works!
@don-prog In
ct2
, specify thecols_3
columns by integer position, not by name.