Allow stacking pandas dataframes in ColumnTransformer?
See original GitHub issueRight now we make it easy to overwrite _hstack
in ColumnTransformer, but I wonder if we should try to be more generic in the first place. I don’t think without something like nep 37 we can do fully generic stacking, but maybe we can already allow stacking pandas dataframes?
I can think of few cases where all the transformers output pandas dataframes but you want to output a numpy array in the ColumnTransformer. This would be a backward incompatible change, so we’d probably need to make it optional (or deprecate the current behavior). Wdyt?
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:5 (5 by maintainers)
Top Results From Across the Web
how to use ColumnTransformer() to return a dataframe?
From sklearn version 1.2 on, transformers can return a pandas DataFrame directly without further handling. It is done with set_output ...
Read more >sklearn.compose.ColumnTransformer
Applies transformers to columns of an array or pandas DataFrame. This estimator allows different columns or column subsets of the input to be...
Read more >How to Use the ColumnTransformer for Data Preparation
The ColumnTransformer is a class in the scikit-learn Python machine learning library that allows you to selectively apply data preparation ...
Read more >pandas.DataFrame.stack — pandas 1.5.2 documentation
Stack the prescribed level(s) from columns to index. Return a reshaped DataFrame or Series having a multi-level index with one or more new...
Read more >Is there a way to force a transformer to return a pandas ...
I am having issues with scikit-learn converting dataframes to numpy arrays. For instance, the following code from sklearn.impute import ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Sorry, perhaps I mis-parsed your language. You’re saying “You’d rarely want a numpy array from transformers outputting dataframes”.
I think stacking frames into frames is good, at least where the indexes are all identical. we could put it into 1.0 without deprecation if we really want.
I think that’s what we are assuming at the moment