question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Lime with pipeline

See original GitHub issue

May you show a case how to use LIME with Pipeline? I do

explainer = lime.lime_tabular.LimeTabularExplainer(union, feature_names=names, 
class_names=y_train, categorical_features=cat, verbose=True, mode='regression')

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().

Where union is sparse matrix obtained from FeatureUnion. Pipeline as below

pipeline = Pipeline([

    # Use FeatureUnion to combine the features
    ('union', FeatureUnion(
        transformer_list=[ 
          # text
            ('text', Pipeline([
                ('select',  norm()),
                ('tfidf', TfidfVectorizer(max_df=40, min_df=3, ngram_range=(1, 4)))
            ])),  
                     # categorical
            ('categorical', Pipeline([
                                 ('selector', MultiColumn(columns=['a', 'b'])),
                ('one_hot', Cat())
            ])),
            # numeric
            ('numeric', Pipeline([
                 ('date', Age()),
                 ('scaling', preprocessing.MinMaxScaler())
            ])),
        ])),
    # Use a regression
    ('model_fitting',  xgb),
])

I extract feature names as below one = pipeline.named_steps['union'].transformer_list[0][1].named_steps['tfidf'].get_feature_names() for text. Then names of categorical variables

enc = DictVectorizer(sparse = False)
cat= X[['a', 'b,]]
enccat = enc.fit((cat.reset_index(drop=True)).T.to_dict().values())
two = enccat.get_feature_names()

and for numeric just

three = ['age']

Then i get array of feature names

names = one+two+three
names = np.asarray(ferche_names)
cat = np.asarray(two)

I get sparse matrix just using part FeatureUnion from Pipeline union = union.fit_transform(X_train)

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
marcotcrcommented, Feb 4, 2018

LimeTabularExplainer assumes the input matrix is dense, and that each column represents a feature (i.e. it’s not one-hot encoded). Unfortunately this would require you to break the pipeline in two, and letting the explainer do the one-hot encoding. (reopen if this doesn’t answer your question)

0reactions
biomed-dmnlpcommented, Oct 27, 2021

LimeTabularExplainer assumes the input matrix is dense, and that each column represents a feature (i.e. it’s not one-hot encoded). Unfortunately this would require you to break the pipeline in two, and letting the explainer do the one-hot encoding. (reopen if this doesn’t answer your question)

Hi Marcotcr, may you explain how to break the pipeline into two? I am also interested in how to explain the model using mixed types of data (categorical and text). thanks.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Lime - basic usage, two class case - GitHub Pages
For this purpose, we use sklearn's pipeline, and implements predict_proba on raw_text lists. In [6]:. from lime import lime_text from sklearn.pipeline ...
Read more >
Pipeline Management: Lime Go | For Sales Managers
Automatic follow-up of activity goals. Lime Go continuously compares your team's sales activities with your goals, allowing you to get a quick understanding ......
Read more >
Using Lime for Interpreting NLP - Medium
First, make a pipeline from whatever standardizers, vectorizers, and models you used. I used tf-idf vectorizer and a random forest model in ...
Read more >
SP LIME - Jupyter Notebooks Gallery
SP LIME. Regression explainer with boston housing prices dataset ... average='binary') from lime import lime_text from sklearn.pipeline import make_pipeline ...
Read more >
eli5.lime — ELI5 0.11.0 documentation - Read the Docs
Parameters: doc (str) – Text to explain; predict_proba (callable) – Black-box classification pipeline. predict_proba should be ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found