question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support inverse_transform in ColumnTransformer

See original GitHub issue

Since ColumnTransformer is primarily used to apply basic preprocessing, it would be extremely helpful if it (and FeatureUnion while we’re at it) supported inverse_transform (where each of its constituent estimators does too). This also would replace some functionality we might otherwise support with categorical_features or ignored_features etc parameters (H/T @qinhanmin2014).

In order for it to support inverse_transform, it would need to have an index of which columns in the output came from which transformers, which is what I intended to solve in #1952. This is not hard as long as fit_transform, rather than fit, is called, or as long as all the constituent transformers support get_feature_names. Maybe we could force a run of transform when fit is called in order to check the output size.

(Alternatively, it might be nice to have a consistent API for a transformer to declare its output size, which n_components_ does for a handful of estimators.)

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:26
  • Comments:18 (9 by maintainers)

github_iconTop GitHub Comments

3reactions
arkanoid87commented, Jan 10, 2019

I need to apply a different scaling factor to every column of my dataset. In particular I need to rescale columns dt.dayofyear, dt.month, dt.quarter and dt.hour to sine and cosine for ML. All my transform functions are reversible so I tried (without reading in depth the manual) to apply ColumnTransform + Pipeline but then I realised that the first was not passing the invert_transform functions described into the FunctionTransformers in my Pipelines.

So I coded this bare encoder where I pass a list of encoders matching the columns order in my dataset and it seems working

class ColumnScaler():
    def __init__(self, encoders=None):
        self.encoders = encoders
    
    def fit(self, X):
        X = check_array(X)
        columns = X.T
        encoders = self.encoders
        for column, encoder in zip(columns, encoders):
            encoder.fit(column.reshape(column.shape[0],1)), 
    
    def transform(self, X):
        X = check_array(X)
        result = np.column_stack(map(
            lambda column, encoder: encoder.transform(column.reshape(column.shape[0],1)), 
            X.T,
            self.encoders
        ))
        return result
    
    def inverse_transform(self, X):
        X = check_array(X)
        result = np.column_stack(map(
            lambda column, encoder: encoder.inverse_transform(column.reshape(column.shape[0],1)), 
            X.T,
            self.encoders
        ))
        return result

like this

def cycle_pipeline():
    return Pipeline([
        ('MinMaxScaler', MinMaxScaler(feature_range=(0, 360))),
        ('deg2rad', FunctionTransformer(func=np.deg2rad, inverse_func=np.rad2deg, validate=True)),
        ('sin', FunctionTransformer(func=np.sin, inverse_func=np.arcsin, validate=True)),
    ])
    
encoders = [
    MinMaxScaler(),
    MinMaxScaler(),
    MinMaxScaler(),
    MinMaxScaler(),
    cycle_pipeline(),
    cycle_pipeline(),
    cycle_pipeline(),
    cycle_pipeline(),
    cycle_pipeline(),
    cycle_pipeline(),
    cycle_pipeline(),
    cycle_pipeline()
]
scaler = ColumnScaler(encoders)
scaler.fit(data_copy)
2reactions
gilbertovilarunccommented, Feb 27, 2022

Is it solved? Reaally need this feature!

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to implement inverse transformation in a pipeline of a ...
I am working with pandas and would like the ColumnTransformer to return a dataframe again. I do this by placing the column names...
Read more >
ColumnTransformer — sktime documentation
Simply takes the column transformer from sklearn and adds capability to handle pandas dataframe. ... The estimator must support fit and transform ....
Read more >
sklearn.compose.ColumnTransformer
Applies transformers to columns of an array or pandas DataFrame. This estimator allows different columns or column subsets of the input to be...
Read more >
Pipelines & Custom Transformers in scikit-learn: The step-by ...
Transform and inverse-transform functions ... would rarely be complete without using either a FeatureUnion or a ColumnTransformer.
Read more >
Column Transformer with Heterogeneous Data Sources - 博客园
The callable to use for the inverse transformation. This will be passed the same arguments as inverse transform, with args and kwargs forwarded....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found