Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

facebook/bart-large-mnli input format

See original GitHub issue

Hi folks,

First off, I’ve been using you guys since the early days and think the effort and time that you put in is just phenomenal. Thank you. All the postgrads I know at the Uni of Edinburgh love HuggingFace.

My question concerns the usage of the facebook/bart-large-mnli checkpoint - specifically the input formatting. The paper mentions that inputs are concatenated and appended with an EOS token, which is then passed to the classification head.

Something like below perhaps? If this is the case, the probabilities do not seem right, seeing as the first two sentences are the exact same.

from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoModel
import torch

t = AutoTokenizer.from_pretrained("facebook/bart-large-mnli")
mc = AutoModelForSequenceClassification.from_pretrained("facebook/bart-large-mnli")

s1 = torch.tensor(t("i am good. [EOS] i am good.", padding="max_length")["input_ids"])
s2 = torch.tensor(t("i am good. [EOS] i am NOT good.", padding="max_length")["input_ids"])
s3 = torch.tensor(t("i am good. [EOS] i am bad.", padding="max_length")["input_ids"])

with torch.no_grad():
  logits = mc(torch.stack((s1,s2,s3)), output_hidden_states=True)[0]

sm = torch.nn.Softmax()
print(sm(logits)) 
# tensor([[0.2071, 0.3143, 0.4786],      # these sentences are the exact same, so why just 0.47?
#             [0.6478, 0.1443, 0.2080],       # slightly better, but this checkpoint gets ~80% acc on MNLI
#             [0.3937, 0.2987, 0.3076]])      # This distribution is almost random, but the sentences are the exact opposite

I note that [EOS] is not registered with the tokenizer special tokens. When I use the registers <s> or </s> I get similar results

Issue Analytics

State:
Created 3 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

sshleifercommented, Jul 14, 2020

Interesting. Happy to look into it if there’s a bug, but otherwise I think this is just a model issue. (Bug = the prediction is very different from the fairseq model for the same input).

0reactions

stale[bot]commented, Sep 24, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Top Results From Across the Web

facebook/bart-large-mnli - Hugging Face

The method works by posing the sequence to be classified as the NLI premise and to construct a hypothesis from each candidate label....

Transformers BART Model Explained for Text Summarization

The model's input and output are in the form of a sequence (text), ... We use “summarization” and the model as “facebook/bart-large-xsum”.

Three Text Classification Techniques That Require Little to no ...

NLI involves determining if an input contradicts, is neutral to, ... model = "facebook/bart-large-mnli" classifier = pipeline(task, model).

Sentiment Analysis with BART - COVID19 - Kaggle

Some weights of the model checkpoint at facebook/bart-large-mnli were not used when initializing BartForSequenceClassification: ['model.encoder.version', 'model ...

R, Reticulate, and Hugging Face Models - Cengiz Zopluoglu

Each NLP model may have a different format for the masked word. ... Resources: https://huggingface.co/facebook/bart-large-mnli ...