Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Lime with pipeline

See original GitHub issue

May you show a case how to use LIME with Pipeline? I do

explainer = lime.lime_tabular.LimeTabularExplainer(union, feature_names=names, 
class_names=y_train, categorical_features=cat, verbose=True, mode='regression')

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().

Where union is sparse matrix obtained from FeatureUnion. Pipeline as below

pipeline = Pipeline([

    # Use FeatureUnion to combine the features
    ('union', FeatureUnion(
        transformer_list=[ 
          # text
            ('text', Pipeline([
                ('select',  norm()),
                ('tfidf', TfidfVectorizer(max_df=40, min_df=3, ngram_range=(1, 4)))
            ])),  
                     # categorical
            ('categorical', Pipeline([
                                 ('selector', MultiColumn(columns=['a', 'b'])),
                ('one_hot', Cat())
            ])),
            # numeric
            ('numeric', Pipeline([
                 ('date', Age()),
                 ('scaling', preprocessing.MinMaxScaler())
            ])),
        ])),
    # Use a regression
    ('model_fitting',  xgb),
])

I extract feature names as below one = pipeline.named_steps['union'].transformer_list[0][1].named_steps['tfidf'].get_feature_names() for text. Then names of categorical variables

enc = DictVectorizer(sparse = False)
cat= X[['a', 'b,]]
enccat = enc.fit((cat.reset_index(drop=True)).T.to_dict().values())
two = enccat.get_feature_names()

and for numeric just

three = ['age']

Then i get array of feature names

names = one+two+three
names = np.asarray(ferche_names)
cat = np.asarray(two)

I get sparse matrix just using part FeatureUnion from Pipeline union = union.fit_transform(X_train)

Issue Analytics

State:
Created 6 years ago
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

marcotcrcommented, Feb 4, 2018

LimeTabularExplainer assumes the input matrix is dense, and that each column represents a feature (i.e. it’s not one-hot encoded). Unfortunately this would require you to break the pipeline in two, and letting the explainer do the one-hot encoding. (reopen if this doesn’t answer your question)

0reactions

biomed-dmnlpcommented, Oct 27, 2021

LimeTabularExplainer assumes the input matrix is dense, and that each column represents a feature (i.e. it’s not one-hot encoded). Unfortunately this would require you to break the pipeline in two, and letting the explainer do the one-hot encoding. (reopen if this doesn’t answer your question)

Hi Marcotcr, may you explain how to break the pipeline into two? I am also interested in how to explain the model using mixed types of data (categorical and text). thanks.

Top Results From Across the Web

Lime - basic usage, two class case - GitHub Pages

For this purpose, we use sklearn's pipeline, and implements predict_proba on raw_text lists. In [6]:. from lime import lime_text from sklearn.pipeline ...

Pipeline Management: Lime Go | For Sales Managers

Automatic follow-up of activity goals. Lime Go continuously compares your team's sales activities with your goals, allowing you to get a quick understanding ......

Using Lime for Interpreting NLP - Medium

First, make a pipeline from whatever standardizers, vectorizers, and models you used. I used tf-idf vectorizer and a random forest model in ...

SP LIME - Jupyter Notebooks Gallery

SP LIME. Regression explainer with boston housing prices dataset ... average='binary') from lime import lime_text from sklearn.pipeline import make_pipeline ...

eli5.lime — ELI5 0.11.0 documentation - Read the Docs

Parameters: doc (str) – Text to explain; predict_proba (callable) – Black-box classification pipeline. predict_proba should be ...