question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to generate sentences in batches, instead of generating sentences one by one

See original GitHub issue

After I finetune GPT-2, I want to use it to generate sentences in batches instead of one by one.

So I tried to modify the code of examples/text-generation/run_generation.py.

the code on line 239 in run_generation.py is: encoded_prompt = tokenizer.encode(prompt_text, add_special_tokens=False, return_tensors="pt") the prompt_text is a str type data, but when I modify it to List[str] type data, It always return 50256.

But looking at the source code, the type of prompt_text can be str , List[str] or List[int].

I tested this example separately , and for token_ids it always returns 50256.

image

So, the prompt_text must be str type data?

What modifications should I make to generate sentences in batches using examples/text-generation/run_generation.py?

Looking forward to your reply!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
patrickvonplatencommented, Oct 8, 2020

Yes! Please take a look at this test, which does batch=4 generation for summarization using T5: https://github.com/huggingface/transformers/blob/55cb2ee62eb482787cff17585955f7193fe35dfa/tests/test_modeling_t5.py#L559

1reaction
parthplccommented, Oct 8, 2020

Hey, @patrickvonplaten is batch generation available for T5conditiongeneration?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Handling multiple sequences - Hugging Face Course
Batching allows the model to work when you feed it multiple sentences. Using multiple sequences is just as simple as building a batch...
Read more >
Deep N-Grams: Batch Generation | Neurotic Networking
The generator converts text lines (sentences) into numpy arrays of integers padded ... While True loop: this will yield one batch at a...
Read more >
Practical text generation using GPT-2, LSTM and Markov Chain
Its goal is to generate meaningful phrases and sentences in the form of human-written text. It has a wide range of use cases:...
Read more >
SentenceTransformer — Sentence-Transformers documentation
Loads or create a SentenceTransformer model, that can be used to map sentences / text to embeddings. ... Initializes internal Module state, shared...
Read more >
How to get translations of one batch of sentences after ...
The model Helsinki-NLP/opus-mt-es-en translates from Spanish to English. Please have a look at the examples below:
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found