Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using function "generate()" to generate text based on casual language model like GPT2 will repeat the input in the begining.

See original GitHub issue

System Info

This issue has nothing to do with the relevant system information.

Who can help?

@patrickvonplaten, @Narsil, @gante


  • The official example scripts
  • My own modified scripts


  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)


Just 2 kind of examples illustrated in your code. 1.Multinomial Sampling:

>>> from transformers import AutoTokenizer, AutoModelForCausalLM
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
>>> model = AutoModelForCausalLM.from_pretrained("gpt2")
>>> prompt = "Today I believe we can finally"
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids
>>> # sample up to 30 tokens
>>> torch.manual_seed(0)  # doctest: +IGNORE_RESULT
>>> outputs = model.generate(input_ids, do_sample=True, max_length=30)
>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
['Today I believe we can finally get rid of discrimination," said Rep. Mark Pocan (D-Wis.).\n\n"Just look at the']

2.Beam-search decoding:

>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
>>> tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-de")
>>> model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-de")
>>> sentence = "Paris is one of the densest populated areas in Europe."
>>> input_ids = tokenizer(sentence, return_tensors="pt").input_ids
>>> outputs = model.generate(input_ids, num_beams=5)
>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
['Paris ist eines der dichtesten besiedelten Gebiete Europas.']

### Expected behavior

I found that, when I used casual language model and "generate()" to generate a text output based on the given text input, there were 2 kinds of situations: 1. if the model is good enough and the input is reasonable, the output will totally repeat the input at the begining; 2.if the model is not good enough or the input is not reasonable, the output will still try to repeat the input but it can't generate the same content as input at the begining.

However, what I want is a output without repeating input, I don't know what parameter I need to set to achieve this. Although I know that this kind of language model will generate output token by token, but I just wonder if we can avoid to repeating input in output with your code.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

Narsilcommented, Oct 28, 2022

Just for future readers:

  • pipelines: from raw string to raw string
  • generate from input_ids tensors to output_ids tensor

generate doesn’t have the option to “cut” the input_ids, it really operates on what the model sees, which are all the ids. pipeline on the other hand is designed to work as much as possible out of the box for non ML users, so it will add some magic for you sometimes (like here cutting the input which is annoying when writing an autocomplete workflow for instance.

Zcchillcommented, Oct 28, 2022

You might want to use

Much Thanks, I’ve read through the source code of “pipeline” in detail, it seems that it call the “generate()” and do the mentioned process through the following code in “postprocess()”: if return_type == ReturnType.FULL_TEXT: all_text = prompt_text + text[prompt_length:] else: all_text = text[prompt_length:] Acatully, I’m confused of the diference between “pipeline(‘text-generation’)” and “generate()” the days before, now I’m more clearly. I’ll close the issue, wish you have a good day.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to generate text: using different decoding methods for ...
We will give a tour of the currently most prominent decoding methods, mainly Greedy search, Beam search, Top-K sampling and Top-p sampling.
Read more >
Text Generation with a GPT-2 Model - Oursky Code
We share our experience in designing an AI text generator built with a GPT-2 model from basic text generation concepts.
Read more >
Text Generation With GPT-2 in Python | Towards Data Science
Learn how to build a high-quality natural language generation model in Python using OpenAI's GPT-2 model. A straightforward guide to easy ...
Read more >
GPT-2 - Wikipedia
Generative Pre-trained Transformer 2 (GPT-2) is an open-source artificial intelligence created by OpenAI in February 2019. GPT-2 translates text, ...
Read more >
The Ultimate Guide to OpenAI's GPT-3 Language Model - Twilio
Generative Pre-trained Transformer 3 (GPT-3) is a new language model created by OpenAI that is able to generate written text of such quality ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Post

No results found

github_iconTop Related Hashnode Post

No results found