GPT-J float16 model output stopping after first word
See original GitHub issueEnvironment info
transformers
version: 4.11.2- Platform: Linux-5.4.0-1045-aws-x86_64-with-glibc2.29
- Python version: 3.8.10
- PyTorch version (GPU?): 1.9.1+cu102 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no
Who can help
Possibly @StellaAthena?
Information
Model I am using (Bert, XLNet …): EleutherAI/gpt-j-6B @ float16
The problem arises when using:
- the official example scripts: (give details below)
- my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- my own task or dataset: (give details below)
To reproduce
The task I am working on is contextual question answering. The model seems to respond correctly to questions without a context, however the output will stop after the first word when a context is present. Snippet to reproduce the behaviour:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_fp16 = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", torch_dtype=torch.float16).to('cuda')
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
prompt = """Please answer the question according to the above context.
===
Context: The United Kingdom of Great Britain and Northern Ireland, commonly known as the United Kingdom (UK) or Britain, is a sovereign country in north-western Europe, off the north-western coast of the European mainland. The United Kingdom includes the island of Great Britain, the north-eastern part of the island of Ireland, and many smaller islands within the British Isles. Northern Ireland shares a land border with the Republic of Ireland. Otherwise, the United Kingdom is surrounded by the Atlantic Ocean, with the North Sea to the east, the English Channel to the south and the Celtic Sea to the south-west, giving it the 12th-longest coastline in the world. The Irish Sea separates Great Britain and Ireland. The total area of the United Kingdom is 93,628 square miles.
===
Q: What surrounds the UK?
A: Atlantic Ocean; North Sea; English Channel; Celtic Sea
Q: What does the UK include?
A:"""
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to('cuda')
gen_tokens = model_fp16.generate(input_ids, do_sample=True, top_p=1.0, temperature=0.00001, max_length=100)
result = tokenizer.batch_decode(gen_tokens)[0]
completion = result[len(prompt):]
if '\n' in completion:
# output first row only
completion = completion[:completion.index('\n')]
print(completion.strip())
Expected behaviour
The above snippet will output only the first word: Great
instead of the expected Great Britain and Northern Ireland
(as it happens with the float32 model, which can be also seen live at https://6b.eleuther.ai/).
Removing the context by replacing prompt
with the following value makes the model output a full phrase.
prompt = """Q: What surrounds the UK?
A: Atlantic Ocean; North Sea; English Channel; Celtic Sea
Q: What does the UK include?
A:"""
Output: England, Scotland, Wales, Northern Ireland, Isle of Man, Channel Islands
I have considered the chance that this might be a limitation of the float16 model, however the fact that first words are guessed correctly makes me think the output is being stopped prematurely somewhere in the code.
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (6 by maintainers)
I’m having the same issue, and installing from the linked PR’s branch allows
max_new_tokens
to work for me without any warnings.If
max_new_tokens
is passed with anon-None
value it should overwritemax_length
ifmax_length
is not passed as a argument togenerate()
. This only affects people that actually passmax_new_tokens
- so this is indeed a bug!Will make a PR to fix it!