question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Multiple Sentence Inputs

See original GitHub issue

My model takes 2 sentences as input, as well as a number of continuous values. The sentences are converted into 2 sentence vectors, operations are applied to them, and then are passed on to the next model along with the continuous values. This model then predicts semantic similarity (softmax) based on cosine distance between sentence vectors and the extra continuous valued features.

I would like to see if I can adapt parts of the LimeTextExplainer in order to explain which parts of each sentence result in high semantic similarity.

Do you have any suggestions on where I should begin? Is it a viable idea? Any help would be appreciated

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
marcotcrcommented, Sep 12, 2016

Oh, this is interesting - you have both text AND continuous features. If all you care about is explaining the text parts while keeping the continuous values fixed, it should be easy, as the only difference is that you have two sentences instead of one. What you would need to do in this case is to put both sentences in a string, separated by a character that is not caught by the split_expression parameter (see the initializer of LimeTextExplainer). You then could define a classifier_fn that takes whatever text LIME gives is, splits it into two sentences, computes the sentence vectors and etc. Your code would look something like this:

def get_classifier_fn(continuous_features):  
    def classifier_fn(text_list):
        ret = []
        for text in text_list: 
            sentence1, sentence2 = text.split(SPLIT_CHARACTER)
            # assuming this returns a number between 0 and 1
            ret.append(model.predict_similarity(get_embedding(sentence1), 
                                                get_embedding(sentence2),
                                                continuous_features))
        return ret
    return classifier_fn

Then, if you wanted to explain a particular instance, you would have to do something like this:

def explain_instance(instance):  
   text = '%s %s %s' % (instance.sentence1, SPLIT_CHARACTER, instance.sentence2)
   fn = get_classifier_fn(instance.continuous_features)
   explainer = LimeTextExplainer(split_expression=SPLIT_EXPRESSION, bow=BOW)
   return explainer.explain_instance(text, fn, labels=(0,))

You may have to be a bit creative if you want to use the visualizations we have, since the text displayed will have both sentences. Also, you have to think if it makes sense to use bow=True or False in this case. If you’re doing anything with sequences, I would say False makes more sense.

If you want to explain the impact of the words and the numerical features, you’ll have to write a function that perturbs both at the same time. If this is the case, you would want to do a mix of __data_labels_distances in lime_text.py and __data_inverse in lime_tabular.py.

Both are viable ideas, the second one is definitely a bit more work. I would be very interested to see what you end up with, I’ve never thought about this particular use case. If you are comfortable sharing what the application is, please tell me over email : ). Let me know if you have any more questions too.

Best,

0reactions
Aureole-1210commented, Jul 15, 2022

Thank you so so much! Your explanations are super helpful. For now, I’m particularly interested in the text components so will focus on that. I will definitely drop you a mail 😃

Thank you so so much! Your explanations are super helpful. For now, I’m particularly interested in the text components so will focus on that. I will definitely drop you a mail 😃

Hi! I am conducting the same task as yours. But I find the final score of every token is similar. I think it’s incorrect. Could you please tell me how you complete it? More appreciate!!!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Handling multiple sequences - Hugging Face Course
How do we handle multiple sequences of different lengths? Are vocabulary indices the only inputs that allow a model to work well? Is...
Read more >
How to get multiple sentences as input in c++? - Stack Overflow
We have an assignment in c++ in which we have store information about multiple books(Author Name, Title, Publisher etc). While getting the title ......
Read more >
Is it possible to take multiple sentences as commandline input?
For various reasons, I have to take multiple strings ( strings may include multiple words ) as command-line input.Say,
Read more >
Learning to Generate Multiple Style Transfer ... - ACL Anthology
... that can convert an input sentence to multiple different output sentences, ... comparisons to several text style transfer approaches on multiple public ......
Read more >
Learning to Generate Multiple Style Transfer Outputs for an ...
... convert an input sentence to multiple different output sentences, ... comparisons to several text style transfer approaches on multiple ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found