Inconsisent behaivor in ColumnTransformer with "all False" vs "empty list"
See original GitHub issueDescribe the workflow you want to enable
I have a selector function I’m using in a ColumnTransformer
that’s returning all False, which should be equivalent to an empty list.
But “all false” from the selector function is not being treated equivalent to “empty list from explicit column names”.
“all False” should skip the transformer, in the same way an empty list does: https://github.com/scikit-learn/scikit-learn/issues/12071
Describe your proposed solution
treat a list of all False
equivalent to an empty list
Describe alternatives you’ve considered, if relevant
Evaluating the Falses myself to an empty list
Additional context
Code snipped:
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
def noop_selector(X):
return [False for col in X]
working_pipeline = ColumnTransformer(
transformers=[
('noop', SimpleImputer(), []),
]
)
broken_pipeline = ColumnTransformer(
transformers=[
('noop', SimpleImputer(), noop_selector),
]
)
data = pd.DataFrame({'col': ['a', 'b', 'c']})
working_pipeline.fit(data)
broken_pipeline.fit(data)
The working pipeline skips the simple imputer. The broken pipeline tries to impute with no data, resulting in ValueError: at least one array or dtype is required.
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (10 by maintainers)
Top Results From Across the Web
ColumnTransformer requires at least one column for each part ...
The scenario of me having an empty list and a list with all their values equal to "False" are not the same? I...
Read more >How to Use the ColumnTransformer for Data Preparation
Applying data transforms like scaling or encoding categorical variables is straightforward when all input variables are the same type.
Read more >sklearn.compose.ColumnTransformer
Applies transformers to columns of an array or pandas DataFrame. This estimator allows different columns or column subsets of the input to be...
Read more >Extracting, transforming and selecting features - Apache Spark
Assume that we have a DataFrame with 4 input columns real , bool , stringNum , and string . These different data types...
Read more >Shape gets changed when preprocessing with column ...
When transforming your test data, you should only transform the data with the ColumnTransformer and not fit it; The OneHotEncoder is initialized ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes, though instead of having an additional extra case, maybe we can merge it with the one above about arrays. I’m not quite sure why
_is_empty_column_selection
was written the way it is, but hopefully our existing tests will tell us what should/could be done. Feel free to submit a PR with a first version @zachmayer and we can iterate from thereI closed my PR in favor of https://github.com/scikit-learn/scikit-learn/pull/17616