Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to generate sentences in batches, instead of generating sentences one by one

See original GitHub issue

After I finetune GPT-2, I want to use it to generate sentences in batches instead of one by one.

So I tried to modify the code of examples/text-generation/run_generation.py.

the code on line 239 in run_generation.py is: encoded_prompt = tokenizer.encode(prompt_text, add_special_tokens=False, return_tensors="pt") the prompt_text is a str type data, but when I modify it to List[str] type data, It always return 50256.

But looking at the source code, the type of prompt_text can be str , List[str] or List[int].

I tested this example separately , and for token_ids it always returns 50256.

So, the prompt_text must be str type data?

What modifications should I make to generate sentences in batches using examples/text-generation/run_generation.py?

Looking forward to your reply!

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

2reactions

patrickvonplatencommented, Oct 8, 2020

Yes! Please take a look at this test, which does batch=4 generation for summarization using T5: https://github.com/huggingface/transformers/blob/55cb2ee62eb482787cff17585955f7193fe35dfa/tests/test_modeling_t5.py#L559

1reaction

parthplccommented, Oct 8, 2020

Hey, @patrickvonplaten is batch generation available for T5conditiongeneration?

Read more comments on GitHub >

Top Results From Across the Web

Handling multiple sequences - Hugging Face Course

Batching allows the model to work when you feed it multiple sentences. Using multiple sequences is just as simple as building a batch...

Deep N-Grams: Batch Generation | Neurotic Networking

The generator converts text lines (sentences) into numpy arrays of integers padded ... While True loop: this will yield one batch at a...

Practical text generation using GPT-2, LSTM and Markov Chain

Its goal is to generate meaningful phrases and sentences in the form of human-written text. It has a wide range of use cases:...

SentenceTransformer — Sentence-Transformers documentation

Loads or create a SentenceTransformer model, that can be used to map sentences / text to embeddings. ... Initializes internal Module state, shared...

How to get translations of one batch of sentences after ...

The model Helsinki-NLP/opus-mt-es-en translates from Spanish to English. Please have a look at the examples below:

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

Checklist for Model Hub

Longformer finetuning on TPUs IndexError: tuple index out of range