facebook/bart-large-mnli input format
See original GitHub issueHi folks,
First off, I’ve been using you guys since the early days and think the effort and time that you put in is just phenomenal. Thank you. All the postgrads I know at the Uni of Edinburgh love HuggingFace.
My question concerns the usage of the facebook/bart-large-mnli
checkpoint - specifically the input formatting. The paper mentions that inputs are concatenated and appended with an EOS token, which is then passed to the classification head.
Something like below perhaps? If this is the case, the probabilities do not seem right, seeing as the first two sentences are the exact same.
from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoModel
import torch
t = AutoTokenizer.from_pretrained("facebook/bart-large-mnli")
mc = AutoModelForSequenceClassification.from_pretrained("facebook/bart-large-mnli")
s1 = torch.tensor(t("i am good. [EOS] i am good.", padding="max_length")["input_ids"])
s2 = torch.tensor(t("i am good. [EOS] i am NOT good.", padding="max_length")["input_ids"])
s3 = torch.tensor(t("i am good. [EOS] i am bad.", padding="max_length")["input_ids"])
with torch.no_grad():
logits = mc(torch.stack((s1,s2,s3)), output_hidden_states=True)[0]
sm = torch.nn.Softmax()
print(sm(logits))
# tensor([[0.2071, 0.3143, 0.4786], # these sentences are the exact same, so why just 0.47?
# [0.6478, 0.1443, 0.2080], # slightly better, but this checkpoint gets ~80% acc on MNLI
# [0.3937, 0.2987, 0.3076]]) # This distribution is almost random, but the sentences are the exact opposite
I note that [EOS]
is not registered with the tokenizer special tokens. When I use the registers <s>
or </s>
I get similar results
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
facebook/bart-large-mnli - Hugging Face
The method works by posing the sequence to be classified as the NLI premise and to construct a hypothesis from each candidate label....
Read more >Transformers BART Model Explained for Text Summarization
The model's input and output are in the form of a sequence (text), ... We use “summarization” and the model as “facebook/bart-large-xsum”.
Read more >Three Text Classification Techniques That Require Little to no ...
NLI involves determining if an input contradicts, is neutral to, ... model = "facebook/bart-large-mnli" classifier = pipeline(task, model).
Read more >Sentiment Analysis with BART - COVID19 - Kaggle
Some weights of the model checkpoint at facebook/bart-large-mnli were not used when initializing BartForSequenceClassification: ['model.encoder.version', 'model ...
Read more >R, Reticulate, and Hugging Face Models - Cengiz Zopluoglu
Each NLP model may have a different format for the masked word. ... Resources: https://huggingface.co/facebook/bart-large-mnli ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Interesting. Happy to look into it if there’s a bug, but otherwise I think this is just a model issue. (Bug = the prediction is very different from the fairseq model for the same input).
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.