behaviour of ZeroShotClassification using facebook/bart-large-mnli is different on online demo vs local machine
See original GitHub issueEnvironment info
transformers
version: 3.4.0- Platform: Ubuntu 20.04
- Python version: 3.7.7
- PyTorch version (GPU?): 1.6.0 (GPU:Yes)
- Tensorflow version (GPU?): No
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Who can help
Information
Model I am using (Bert, XLNet …): facebook/bart-large-mnli
The problem arises when using:
- the official example scripts: (give details below)
- my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- my own task or dataset: (give details below)
To reproduce
Steps to reproduce the behavior:
First I tried the hosted demo online at huggingface, which gives me a very high score of 0.99 for travelling (as expected):
Then I tried to run the code on my local machine, which returns very different scores for all labels (poor scores):
from transformers import pipeline
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-mnli")
model = AutoModel.from_pretrained("facebook/bart-large-mnli")
zsc = pipeline(task='zero-shot-classification', tokenizer=tokenizer, model=model)
sequences = 'one day I will see the world'
candidate_labels = ['travelling', 'cooking', 'dancing']
results = zsc(sequences=sequences, candidate_labels=candidate_labels, multi_class=False)
print(results)
>>>{'sequence': 'one day I will see the world',
'labels': ['travelling', 'dancing', 'cooking'],
'scores': [0.5285395979881287, 0.2499372661113739, 0.22152313590049744]}
I got this warning message when initializing the model:
model = AutoModel.from_pretrained("facebook/bart-large-mnli")
Some weights of the model checkpoint at facebook/bart-large-mnli were not used when initializing BartModel: ['model.encoder.version', 'model.decoder.version']
- This IS expected if you are initializing BartModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BartModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Expected behavior
Code on my local machine’s score to be quite similar to the online demo.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
facebook/bart-large-mnli
The method works by posing the sequence to be classified as the NLI premise and to construct a hypothesis from each candidate label....
Read more >Methods for data and user efficient annotation for multi- ...
Zero-shot learning is a setup in which a Machine Learning model can make predictions for classes it was not trained to predict. Zero-...
Read more >Zero-Shot Learning in Modern NLP | Joe Davison Blog
State-of-the-art NLP models for text classification without annotated data. ... Check out our live zero-shot topic classification demo here.
Read more >Zero-Shot Text Classification with Hugging Face
There is a live demo from Hugging Face team, along with a sample Colab ... Zero-shot classification with transformers is straightforward, ...
Read more >Hugging Face - Could not load model facebook/bart-large- ...
facebook/bart-large-mnli doesn't offer a TensorFlow model at the moment. To load the PyTorch model into the pipeline, make sure you have ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@joeddav that fixed it thanks !
Replace
AutoModel
withAutoModelForSequenceClassification
. The former won’t add the sequence classification head, i.e. it will useBartModel
instead ofBartForSequenceClassification
, so the pipeline is trying to use just the outputs of the encoder instead of the NLI predictions in your snippet.