question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pipelines returns inconsistent results when using non-default model

See original GitHub issue

System Info

Transformers version 4.19.2 Python 3.7.13 Ubuntu 16.04.6 LTS

Who can help?

@Narsil

I’ve noticed that pipeline returns inconsistent results, after re-instantiating it, when supplying a non-standard model. See code below.

  • What is being returned and why does it change?
  • What exactly does pipeline do when you give it a non-default model or a model not trained for the specific task?
  • Since it doesn’t necessarily make sense to use bert-base-uncased for a sentiment analysis task, should pipeline allow this? I don’t get a warning or error. Is there a recommended way to tell pipeline to fail if the supplied model doesn’t make sense?

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

>>> from transformers import pipeline
>>> pipe = pipeline("sentiment-analysis", model="bert-base-uncased")
>>> pipe("This restaurant is awesome")
[{'label': 'LABEL_0', 'score': 0.5899267196655273}]

>>> pipe = pipeline("sentiment-analysis", model="bert-base-uncased")
>>> pipe("This restaurant is awesome")
[{'label': 'LABEL_0', 'score': 0.5623320937156677}]

>>> pipe = pipeline("sentiment-analysis", model="bert-base-uncased")
>>> pipe("This restaurant is awesome")
[{'label': 'LABEL_1', 'score': 0.5405012369155884}]

Expected behavior

I would expect pipeline to either fail or give a warning message if given a model not trained for the task.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:11 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
Narsilcommented, Jul 15, 2022

What is that number? How is it being calculated? Why does it change when I re-instantiate the pipeline? Regardless of how to handle all of this, these answers are not clear from the documentation.

By default the classification is created randomly. Then the correct weights are placed onto your model. Since those weights are missing we just don’t place them. That’s why outputs change all the time. the head is different all the times.

1reaction
Narsilcommented, Jul 15, 2022

@sjgiorgi

I do agree that it’s easy to miss warnings, especially when running setups automatically and serving them for instance, those warnings might not be readily visible to you.

The real culprit here, is that the model architecture you are trying to load is actually very capable of running the pipeline. But the model weights themselves are missing the layers the architecture is looking for (here it doesn’t have the classification head).

Catching the warning would be the best way to be 100% sure it works that way.

Pinging a core maintainer to see if we have other solutions. My personal idea would be to enable a flag to raise a hard error on mismatched weights instead of a warning, and using that flag in pipelines because we really don’t want to load from pretrained an incomplete model. It’s a different story in Model.from_pretrained where it’s actually a desired feature if you intend to finetune,

@sgugger maybe ?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Review test results - Azure Pipelines | Microsoft Learn
Test reports provide an effective and consistent way to view the tests results executed using different test frameworks, in order to measure ...
Read more >
Pipeline Loading Models and Tokenizers for Q&A
Thanks @dennlinger I'm able to load now but not able to use. Either when I try to fit both the loaded tokenizer and...
Read more >
run dali pipeline reader in CPU and GPU got inconsistent ...
I'm running a segmentation model using dali readers in CPU mode and GPU mode. It turns out that, using CPU got inconsistent compared...
Read more >
Professional Data Engineer on Google Cloud Platform Exam ...
Dataflow pipelines can be programmed in Java. Dataflow pipelines use a unified programming model, so can work both with streaming and batch data...
Read more >
Associate a Content Recommendations (CR) Model With a ...
When a Coveo Machine Learning (Coveo ML) Content Recommendations (CR) model has been created, it must be associated with a query pipeline to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found