Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pipelines returns inconsistent results when using non-default model

See original GitHub issue

System Info

Transformers version 4.19.2 Python 3.7.13 Ubuntu 16.04.6 LTS

Who can help?

@Narsil

I’ve noticed that pipeline returns inconsistent results, after re-instantiating it, when supplying a non-standard model. See code below.

What is being returned and why does it change?
What exactly does pipeline do when you give it a non-default model or a model not trained for the specific task?
Since it doesn’t necessarily make sense to use bert-base-uncased for a sentiment analysis task, should pipeline allow this? I don’t get a warning or error. Is there a recommended way to tell pipeline to fail if the supplied model doesn’t make sense?

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

>>> from transformers import pipeline
>>> pipe = pipeline("sentiment-analysis", model="bert-base-uncased")
>>> pipe("This restaurant is awesome")
[{'label': 'LABEL_0', 'score': 0.5899267196655273}]

>>> pipe = pipeline("sentiment-analysis", model="bert-base-uncased")
>>> pipe("This restaurant is awesome")
[{'label': 'LABEL_0', 'score': 0.5623320937156677}]

>>> pipe = pipeline("sentiment-analysis", model="bert-base-uncased")
>>> pipe("This restaurant is awesome")
[{'label': 'LABEL_1', 'score': 0.5405012369155884}]

Expected behavior

I would expect pipeline to either fail or give a warning message if given a model not trained for the task.

Issue Analytics

State:
Created a year ago
Comments:11 (7 by maintainers)

Top GitHub Comments

1reaction

Narsilcommented, Jul 15, 2022

What is that number? How is it being calculated? Why does it change when I re-instantiate the pipeline? Regardless of how to handle all of this, these answers are not clear from the documentation.

By default the classification is created randomly. Then the correct weights are placed onto your model. Since those weights are missing we just don’t place them. That’s why outputs change all the time. the head is different all the times.

1reaction

Narsilcommented, Jul 15, 2022

@sjgiorgi

I do agree that it’s easy to miss warnings, especially when running setups automatically and serving them for instance, those warnings might not be readily visible to you.

The real culprit here, is that the model architecture you are trying to load is actually very capable of running the pipeline. But the model weights themselves are missing the layers the architecture is looking for (here it doesn’t have the classification head).

Catching the warning would be the best way to be 100% sure it works that way.

Pinging a core maintainer to see if we have other solutions. My personal idea would be to enable a flag to raise a hard error on mismatched weights instead of a warning, and using that flag in pipelines because we really don’t want to load from pretrained an incomplete model. It’s a different story in Model.from_pretrained where it’s actually a desired feature if you intend to finetune,

@sgugger maybe ?