question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ColumnTransformer should be able to use a function to select columns

See original GitHub issue

The new ColumnTransformer allows the user to specify column names or indices. I think it should be possible to specify a set of columns as a function. Indeed, towards #10603, we should probably have an inbuilt function that distinguishes between categorical and numeric pd.Series dtypes.

In order to support the remainder functionality, I believe the function should have signature: (X,) -> column indices where column indices is any of the supported column specification formats.

Sound good @amueller, @jorisvandenbossche?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:2
  • Comments:21 (21 by maintainers)

github_iconTop GitHub Comments

2reactions
jnothmancommented, Jun 10, 2018

I think you should be very careful to avoid any operator overloading. Scikit-learn tends to be quite conservative in this area, and tries to avoid “magic” in its APIs. Don’t worry about defining categorical_features, etc. Worry about ensuring that there’s an interface for users to write custom selectors that don’t require them to know the columns by name or index beforehand, which the current interface does. Later we can worry about making a library of such selectors and how that would look.​

1reaction
jnothmancommented, Jun 5, 2018

Thanks @partmor, I look forward to your pr. Please start with supporting functoons, testing it, and demonstrating with a realistic example. Then we can consider simplifying the API common use cases. @jorisvandenbossche’s proposal does not, for instance, account for disparity in the definition of “categorical feature”

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Use the ColumnTransformer for Data Preparation
In this tutorial, you will discover how to use the ColumnTransformer to selectively apply data transforms to columns in a dataset with mixed ......
Read more >
sklearn.compose.ColumnTransformer
To select multiple columns by name or dtype, you can use make_column_selector . remainder{'drop', 'passthrough'} or estimator, default='drop'.
Read more >
Use ColumnTransformer to apply different preprocessing to ...
Use ColumnTransformer to apply different preprocessing to different columns :- select from DataFrame columns by name- passthrough or drop ...
Read more >
how to use ColumnTransformer() to return a dataframe?
With sklearn version 1.2.0 it will be possible to solve the problem of returning a DataFrame when transforming a ColumnTransformer instance ...
Read more >
Pipeline, ColumnTransformer and FeatureUnion explained
We will use estimator and model interchangeably throughout this post. ... we added an extra step where we selected relevant columns using a ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found