question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Logit explosion in MobileBertForNextSentencePrediction example from documentation (and all others tried)

See original GitHub issue

Environment info

  • transformers version: 4.11.3
  • Platform: Darwin-19.6.0-x86_64-i386-64bit
  • Python version: 3.6.8
  • PyTorch version (GPU?): 1.9.1 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

Who can help

@vshampor

Information

Model I am using (Bert, XLNet …): MobileBertForNextSentencePrediction

The problem arises when using:

The tasks I am working on is:

  • an official GLUE/SQUaD task: Next Sentence Prediction

To reproduce

Steps to reproduce the behavior:

Run the code from the official example script in the documentation:

>>> from transformers import MobileBertTokenizer, MobileBertForNextSentencePrediction
>>> import torch

>>> tokenizer = MobileBertTokenizer.from_pretrained('google/mobilebert-uncased')
>>> model = MobileBertForNextSentencePrediction.from_pretrained('google/mobilebert-uncased')

>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> next_sentence = "The sky is blue due to the shorter wavelength of blue light."
>>> encoding = tokenizer(prompt, next_sentence, return_tensors='pt')

>>> outputs = model(**encoding, labels=torch.LongTensor([1]))
>>> loss = outputs.loss
>>> logits = outputs.logits

Printing logits, we get tensor([[2.7888e+08, 2.7884e+08]], grad_fn=<AddmmBackward>) - strangely huge for both classes, and one that leads to a softmax score of 1 for the “is next sentence” class—the opposite of the correct answer, which is strange for an example from the documentation.

I ran it on a handful of related prompt and next sentence pairs, then on a larger set from my own NSP dataset, and got the same strange behavior: logits of about 2e+08 for both classes, and higher for the first class in the 3rd or 4th significant figure, no matter the prompt and sentence pair. Given the sizes, it leads to a softmax score of 1 “is the next sentence” (the first class) and 0 for the other no matter what the first and second sentence is, no matter how unrelated the second sentence is.

Expected behavior

For comparison, the logits produced on the same example using BertForNextSentencePrediction with bert-base-uncased instead on this example are tensor([[-3.0729, 5.9056]], grad_fn=<AddmmBackward>). I would expect that for an example with ‘next sentence’ from the “Not following the prompt” category like this, MobileBertForNextSentencePrediction with the default pretrained model would get this right, and have logits in a similar ballpark - not huge positive values like the ones pictured.

I posted about this on HuggingFace Hub discussion board, but it got immediately taken down by the bot for some reason. Linking here in case admins approve it: https://discuss.huggingface.co/t/next-sentence-prediction-with-google-mobilebert-uncased-producing-massive-near-identical-logits-10-8-for-its-documentation-example-and-2k-others-tried/10750/1.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
vshamporcommented, Oct 26, 2021

Checked the code and yes, the MobileBertForPreTraining and MobileBertForNextSentencePrediction are crafted in such a way, state_dict-wise, that PreTraining is loadable into the NextSentencePrediction; the LM head won’t be loaded, (but it shouldn’t get used anyway). My theory doesn’t explain the current state of affairs, then - the example should be working since the NSP from pretraining should have been transferred into the NSP-specific model. Will try and investigate further.

0reactions
github-actions[bot]commented, Nov 19, 2021

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Read more comments on GitHub >

github_iconTop Results From Across the Web

https://huggingface.co/ccdv/lsg-legal-base-uncased...
+ +![attn](attn.png) + +## Usage +The model relies on a custom modeling ... the superclass for the appropriate + documentation alongside usage examples....
Read more >
Deep Learning for Session Aware Conversational Agents
action. During the computer science history, different attempts to create a NLU system have been made. A famous example is the SHRDLU[19], a....
Read more >
Transfer Learning for Natural Language Processing [1&nbsp
Agency (DARPA) ecosystem. We used transfer learning to reduce the requirement for labeled data by training NLP systems on simulated data first and...
Read more >
Full text of "Python Ebooks" - Internet Archive
Full text of "Python Ebooks". See other formats. EXPERT INSIGHT Deep Learning with TensorFlow 2 and Keras Regression, ConvNets, GANs, RNNs NLP, and...
Read more >
Antonio Gulli, Amita Kapoor, Sujit Pal - Deep Learning with ...
Machine learning, artificial intelligence, and the deep learning Cambrian explosion Artificial intelligence (AI) lays the ground for everything this book ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found