Cannot get feature names after ColumnTransformer
See original GitHub issueWhen I use ColumnTransformer to preprocess different columns (include numeric, category, text) with pipeline, I cannot get the feature names of the final transformed data, which is hard for debugging.
Here is the code:
titanic_url = ('https://raw.githubusercontent.com/amueller/'
'scipy-2017-sklearn/091d371/notebooks/datasets/titanic3.csv')
data = pd.read_csv(titanic_url)
target = data.pop('survived')
numeric_columns = ['age','sibsp','parch']
category_columns = ['pclass','sex','embarked']
text_columns = ['name','home.dest']
numeric_transformer = Pipeline(steps=[
('impute',SimpleImputer(strategy='median')),
('scaler',StandardScaler()
)
])
category_transformer = Pipeline(steps=[
('impute',SimpleImputer(strategy='constant',fill_value='missing')),
('ohe',OneHotEncoder(handle_unknown='ignore'))
])
text_transformer = Pipeline(steps=[
('cntvec',CountVectorizer())
])
preprocesser = ColumnTransformer(transformers=[
('numeric',numeric_transformer,numeric_columns),
('category',category_transformer,category_columns),
('text',text_transformer,text_columns[0])
])
preprocesser.fit_transform(data)
preprocesser.get_feature_names()
will get error:AttributeError: Transformer numeric (type Pipeline) does not provide get_feature_names.
- In
ColumnTransformer
,text_transformer
can only process a string (eg ‘Sex’), but not a list of string astext_columns
Issue Analytics
- State:
- Created 5 years ago
- Reactions:21
- Comments:20 (4 by maintainers)
Top Results From Across the Web
Sklearn Pipeline: Get feature names after OneHotEncode In ...
To complete Venkatachalam's answer with what Paul asked in his comment, the order of feature names as it appears in the ColumnTransformer ....
Read more >Extracting Column Names from the ColumnTransformer
scikit-learn's ColumnTransformer is a great tool for data preprocessing but returns a numpy array without column names.
Read more >sklearn.compose.ColumnTransformer
Fit all transformers, transform the data and concatenate results. get_feature_names_out ([input_features]). Get output feature names for transformation.
Read more >Extracting Feature Names from the ColumnTransformer
Get feature names from ColumnTransformer in scikit-learn. ... After transforming the features, they do not have names in the new numpy array ...
Read more >Extracting Scikit Feature Names & Importances - Kaggle
Extracting & Plotting Feature Names & Importance from Scikit-Learn Pipelines¶ ... verbose=None): """ Get the column names from the a ColumnTransformer ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks for your kind reply! As I know, when I preprocess a column using methods which can change one column to multi-columns such as
OneHotEncoder
,CountVectorizer
, I can get the new data column names from pipeline last step’s transformer by functionget_feature_names
, when using methods which not create new columns, can just set the raw columns name.Using above code, I can get my
preprocesser
's column names. Is these code solve this question? As of eli5, I do not find that function, Can you give me a link for the explicit example or api for eli5?FYI, I wrote some code and a blog about how to extract the feature names from complex Pipelines & ColumnTransformers. The code is an improvement over my previous post. https://towardsdatascience.com/extracting-plotting-feature-names-importance-from-scikit-learn-pipelines-eb5bfa6a31f4