question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GPT-J float16 model output stopping after first word

See original GitHub issue

Environment info

  • transformers version: 4.11.2
  • Platform: Linux-5.4.0-1045-aws-x86_64-with-glibc2.29
  • Python version: 3.8.10
  • PyTorch version (GPU?): 1.9.1+cu102 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: no

Who can help

Possibly @StellaAthena?

Information

Model I am using (Bert, XLNet …): EleutherAI/gpt-j-6B @ float16

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

The task I am working on is contextual question answering. The model seems to respond correctly to questions without a context, however the output will stop after the first word when a context is present. Snippet to reproduce the behaviour:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_fp16 = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", torch_dtype=torch.float16).to('cuda')
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")

prompt = """Please answer the question according to the above context.
===
Context: The United Kingdom of Great Britain and Northern Ireland, commonly known as the United Kingdom (UK) or Britain, is a sovereign country in north-western Europe, off the north-western coast of the European mainland. The United Kingdom includes the island of Great Britain, the north-eastern part of the island of Ireland, and many smaller islands within the British Isles. Northern Ireland shares a land border with the Republic of Ireland. Otherwise, the United Kingdom is surrounded by the Atlantic Ocean, with the North Sea to the east, the English Channel to the south and the Celtic Sea to the south-west, giving it the 12th-longest coastline in the world. The Irish Sea separates Great Britain and Ireland. The total area of the United Kingdom is 93,628 square miles.
===
Q: What surrounds the UK?
A: Atlantic Ocean; North Sea; English Channel; Celtic Sea
Q: What does the UK include?
A:"""
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to('cuda')
gen_tokens = model_fp16.generate(input_ids, do_sample=True, top_p=1.0, temperature=0.00001, max_length=100)
result = tokenizer.batch_decode(gen_tokens)[0]
completion = result[len(prompt):]
if '\n' in completion:
    # output first row only
    completion = completion[:completion.index('\n')]

print(completion.strip())

Expected behaviour

The above snippet will output only the first word: Great instead of the expected Great Britain and Northern Ireland (as it happens with the float32 model, which can be also seen live at https://6b.eleuther.ai/).

Removing the context by replacing prompt with the following value makes the model output a full phrase.

prompt = """Q: What surrounds the UK?
A: Atlantic Ocean; North Sea; English Channel; Celtic Sea
Q: What does the UK include?
A:"""

Output: England, Scotland, Wales, Northern Ireland, Isle of Man, Channel Islands

I have considered the chance that this might be a limitation of the float16 model, however the fact that first words are guessed correctly makes me think the output is being stopped prematurely somewhere in the code.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
aphedgescommented, Oct 8, 2021

I’m having the same issue, and installing from the linked PR’s branch allows max_new_tokens to work for me without any warnings.

1reaction
patrickvonplatencommented, Oct 7, 2021

If max_new_tokens is passed with a non-None value it should overwrite max_length if max_length is not passed as a argument to generate(). This only affects people that actually pass max_new_tokens - so this is indeed a bug!

Will make a PR to fix it!

Read more comments on GitHub >

github_iconTop Results From Across the Web

GPT-J - Hugging Face
The GPT-J model was released in the kingoflolz/mesh-transformer-jax repository by ... It is a GPT-2-like causal language model trained on the Pile dataset....
Read more >
[BUG] GPT-J InferenceEngine outputs diverging from base ...
Describe the bug. The GPT-J InferenceEngine returns low-quality outputs that diverge from the base model.
Read more >
python - How to early-stop autoregressive model with a list of ...
I found that there is a StoppingCriteria method in the source code but without further instructions on how to use it. Does anyone...
Read more >
Deploy GPT-J 6B for inference using Hugging Face ...
Learn how to deploy EleutherAIs GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker.
Read more >
More Text Cleaning To Increase Word Coverage | Kaggle
In this notebook, I am going to sharing some text cleaning mostly based on bad case analyse. The cleaning methods include:.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found