Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add support to Imbalanced-learn pipelines in ClassifierExplainer

See original GitHub issue

When I try to generate a `ClassifierExplainer``on a imblearn pipeline I get the following error:

TypeError: All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough'
'SMOTETomek(random_state=42)' (type <class 'imblearn.combine._smote_tomek.SMOTETomek'>) doesn't

Full traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-75-2a92cf12a19d> in <module>
      1 from explainerdashboard import ClassifierExplainer, InlineExplainer
----> 2 explainer = ClassifierExplainer(best_model, X_test, y_test)

/opt/conda/lib/python3.7/site-packages/explainerdashboard/explainers.py in __init__(self, model, X, y, permutation_metric, shap, X_background, model_output, cats, cats_notencoded, idxs, index_name, target, descriptions, n_jobs, permutation_cv, cv, na_fill, precision, labels, pos_label)
   1999                             cats, cats_notencoded, idxs, index_name, target,
   2000                             descriptions, n_jobs, permutation_cv, cv, na_fill,
-> 2001                             precision)
   2002 
   2003         assert hasattr(model, "predict_proba"), \

/opt/conda/lib/python3.7/site-packages/explainerdashboard/explainers.py in __init__(self, model, X, y, permutation_metric, shap, X_background, model_output, cats, cats_notencoded, idxs, index_name, target, descriptions, n_jobs, permutation_cv, cv, na_fill, precision)
    138             if shap != 'kernel':
    139                 pipeline_model = model.steps[-1][1]
--> 140                 pipeline_transformer = Pipeline(model.steps[:-1])
    141                 if hasattr(model, "predict") and hasattr(pipeline_transformer, "transform"):
    142                     X_transformed = pipeline_transformer.transform(X)

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py in __init__(self, steps, memory, verbose)
    116         self.memory = memory
    117         self.verbose = verbose
--> 118         self._validate_steps()
    119 
    120     def get_params(self, deep=True):

/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py in _validate_steps(self)
    169                                 "transformers and implement fit and transform "
    170                                 "or be the string 'passthrough' "
--> 171                                 "'%s' (type %s) doesn't" % (t, type(t)))
    172 
    173         # We allow last estimator to be None as an identity transformation

TypeError: All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'SMOTETomek(random_state=42)' (type <class 'imblearn.combine._smote_tomek.SMOTETomek'>) doesn't

Versions System: python: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) [GCC 9.3.0] executable: /opt/conda/bin/python machine: Linux-5.4.120±x86_64-with-debian-buster-sid

Python dependencies: pip: 21.1.2 setuptools: 49.6.0.post20210108 sklearn: 0.24.2 imblearn: 0.8.0 explainerdashboard: latest numpy: 1.19.5 scipy: 1.6.3 Cython: 0.29.23 pandas: 1.2.4 matplotlib: 3.4.2 joblib: 1.0.1 threadpoolctl: 2.1.0

Issue Analytics

State:
Created 2 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

oegedijkcommented, Dec 26, 2021

I’m not very familiar with these types of pipelines, but seems you could do something like this to get the transformed input dataframe, and then extract the model:

X_transformed = pd.DataFrame(
     make_pipeline(*[step[1] for step in pipeline.steps[:-1]]).transform(X_test),
     columns=X_test.columns
 )
 model = steps[-1][1]

explainer = ClassifierExplainer(model, X_transformed, y_test)

actually maybe I could add some support for this…

0reactions

Abdelgha-4commented, Dec 25, 2021

Ah I see, this is regrettable as I would love to exploit your works for imbalanced data models. Is there any work around for this in the time being ?

Top Results From Across the Web

Pipeline — Version 0.10.0 - Imbalanced-Learn

The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables...

SMOTE for Imbalanced Classification with Python

The imbalanced-learn library supports random undersampling via the RandomUnderSampler class. We can update the example to first oversample the ...

Strategies for Imbalanced Data Daniel Foley

This article discusses 4 proven strategies for dealing with Class Imbalances in Machine Learning prediction using Python.

Imbalanced Data - Medium

Some techniques to manage imbalanced data in python ... Because when we try apply a classification model to this kind of data, it...

How to deal with Class Imbalance in Python - Data Analytics

Python packages such as Imbalanced Learn can be used to apply techniques related to under-sampling majority classes, upsampling minority classes ...