Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GPT-J float16 model output stopping after first word

See original GitHub issue

Environment info

transformers version: 4.11.2
Platform: Linux-5.4.0-1045-aws-x86_64-with-glibc2.29
Python version: 3.8.10
PyTorch version (GPU?): 1.9.1+cu102 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help

Possibly @StellaAthena?

Information

Model I am using (Bert, XLNet …): EleutherAI/gpt-j-6B @ float16

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

The task I am working on is contextual question answering. The model seems to respond correctly to questions without a context, however the output will stop after the first word when a context is present. Snippet to reproduce the behaviour:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_fp16 = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", torch_dtype=torch.float16).to('cuda')
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")

prompt = """Please answer the question according to the above context.
===
Context: The United Kingdom of Great Britain and Northern Ireland, commonly known as the United Kingdom (UK) or Britain, is a sovereign country in north-western Europe, off the north-western coast of the European mainland. The United Kingdom includes the island of Great Britain, the north-eastern part of the island of Ireland, and many smaller islands within the British Isles. Northern Ireland shares a land border with the Republic of Ireland. Otherwise, the United Kingdom is surrounded by the Atlantic Ocean, with the North Sea to the east, the English Channel to the south and the Celtic Sea to the south-west, giving it the 12th-longest coastline in the world. The Irish Sea separates Great Britain and Ireland. The total area of the United Kingdom is 93,628 square miles.
===
Q: What surrounds the UK?
A: Atlantic Ocean; North Sea; English Channel; Celtic Sea
Q: What does the UK include?
A:"""
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to('cuda')
gen_tokens = model_fp16.generate(input_ids, do_sample=True, top_p=1.0, temperature=0.00001, max_length=100)
result = tokenizer.batch_decode(gen_tokens)[0]
completion = result[len(prompt):]
if '\n' in completion:
    # output first row only
    completion = completion[:completion.index('\n')]

print(completion.strip())

Expected behaviour

The above snippet will output only the first word: Great instead of the expected Great Britain and Northern Ireland (as it happens with the float32 model, which can be also seen live at https://6b.eleuther.ai/).

Removing the context by replacing prompt with the following value makes the model output a full phrase.

prompt = """Q: What surrounds the UK?
A: Atlantic Ocean; North Sea; English Channel; Celtic Sea
Q: What does the UK include?
A:"""

Output: England, Scotland, Wales, Northern Ireland, Isle of Man, Channel Islands

I have considered the chance that this might be a limitation of the float16 model, however the fact that first words are guessed correctly makes me think the output is being stopped prematurely somewhere in the code.

Issue Analytics

State:
Created 2 years ago
Comments:7 (6 by maintainers)

Top GitHub Comments

1reaction

aphedgescommented, Oct 8, 2021

I’m having the same issue, and installing from the linked PR’s branch allows max_new_tokens to work for me without any warnings.

1reaction

patrickvonplatencommented, Oct 7, 2021

If max_new_tokens is passed with a non-None value it should overwrite max_length if max_length is not passed as a argument to generate(). This only affects people that actually pass max_new_tokens - so this is indeed a bug!

Will make a PR to fix it!

Top Results From Across the Web

GPT-J - Hugging Face

The GPT-J model was released in the kingoflolz/mesh-transformer-jax repository by ... It is a GPT-2-like causal language model trained on the Pile dataset....

[BUG] GPT-J InferenceEngine outputs diverging from base ...

Describe the bug. The GPT-J InferenceEngine returns low-quality outputs that diverge from the base model.

python - How to early-stop autoregressive model with a list of ...

I found that there is a StoppingCriteria method in the source code but without further instructions on how to use it. Does anyone...

Deploy GPT-J 6B for inference using Hugging Face ...

Learn how to deploy EleutherAIs GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker.

More Text Cleaning To Increase Word Coverage | Kaggle

In this notebook, I am going to sharing some text cleaning mostly based on bad case analyse. The cleaning methods include:.