Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inconsisent behaivor in ColumnTransformer with "all False" vs "empty list"

See original GitHub issue

Describe the workflow you want to enable

I have a selector function I’m using in a ColumnTransformer that’s returning all False, which should be equivalent to an empty list.

But “all false” from the selector function is not being treated equivalent to “empty list from explicit column names”.

“all False” should skip the transformer, in the same way an empty list does: https://github.com/scikit-learn/scikit-learn/issues/12071

Describe your proposed solution

treat a list of all False equivalent to an empty list

Describe alternatives you’ve considered, if relevant

Evaluating the Falses myself to an empty list

Additional context

Code snipped:

import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer

def noop_selector(X):
    return [False for col in X]

working_pipeline = ColumnTransformer(
    transformers=[
        ('noop', SimpleImputer(), []),
    ]
)

broken_pipeline = ColumnTransformer(
    transformers=[
        ('noop', SimpleImputer(), noop_selector),
    ]
)

data = pd.DataFrame({'col': ['a', 'b', 'c']})
working_pipeline.fit(data)
broken_pipeline.fit(data)

The working pipeline skips the simple imputer. The broken pipeline tries to impute with no data, resulting in ValueError: at least one array or dtype is required.

Issue Analytics

State:
Created 3 years ago
Comments:10 (10 by maintainers)

Top GitHub Comments

1reaction

NicolasHugcommented, Jul 2, 2020

Do we want to support “list of boolean” for column selection?

Yes, though instead of having an additional extra case, maybe we can merge it with the one above about arrays. I’m not quite sure why _is_empty_column_selection was written the way it is, but hopefully our existing tests will tell us what should/could be done. Feel free to submit a PR with a first version @zachmayer and we can iterate from there

0reactions

zachmayercommented, Jul 8, 2020

I closed my PR in favor of https://github.com/scikit-learn/scikit-learn/pull/17616